Re: Question for the Group
- From: JF Mezei <jfmezei.spamnot@xxxxxxxxxxxxx>
- Date: Fri, 15 Jun 2007 06:59:28 -0400
Michael Kraemer wrote:
but in our case - IIRC - the system wasn't even meant
to be bullet proof for the big disaster, apart from - maybe -
disk mirroring. It just turned out to be less reliable in total
and couldn't stand even a little disaster. More than a decade
ago as well as last year.
Then your cluster was not configured properly.
When you start to consider all the possible failure modes inside and between nodes and the rest of the world, you start to understand why it becomes very important to have nodes freeze and/or voluntarily crash during a problem where nodes lose sight of each other.
Consider just disk arrays accessed directly by many nodes. If nodes lose sight of each other and continue to operate, it means that node1's locked record isn't seen as "locked" by node2 which will then mess with it.
And the voting scheme is there as a tool to let the system manager designate the critical from less critical nodes in the cluster. It also allows one to use deductive reasonsing to determine the extent of a fault.
Say a server has 2 votes, and 2 workstations have 1 vote each. Normally, a workstation might have 0 vote. But in this case, having 1 vote allows the server to know whether it has lost all ethernet connectivity or just lost one node. Aka: if both workstations go, it is likely that the server has lost ethernet completely, but if only one of the 2 workstations go, then the server still sees the vote for the second workstation and can be parametrised to then continue to work based on the voting scheme designed by the system manager.
VMS can be criticised in many ways (especially its marketing). But its clustering is far more advanced than any competitor in the field. But it is not idiot proof and still needs someone to sit down and think through the design of the cluster and voting scheme.
.
- References:
- Question for the Group
- From: David J Dachtera
- Re: Question for the Group
- From: Richard B. Gilbert
- Re: Question for the Group
- From: Michael Kraemer
- Re: Question for the Group
- From: Ron Johnson
- Re: Question for the Group
- From: Michael Kraemer
- Re: Question for the Group
- From: BaxterD
- Re: Question for the Group
- From: Michael Kraemer
- Re: Question for the Group
- From: JF Mezei
- Re: Question for the Group
- From: Michael Kraemer
- Question for the Group
- Prev by Date: Re: Another opportunity
- Next by Date: Re: Another opportunity
- Previous by thread: Re: Question for the Group
- Next by thread: Re: Question for the Group
- Index(es):
Relevant Pages
|