Re: Partitioned cluster question (reboot during lost quorum)
- From: hoffman@xxxxxxxxxxxxxxxxxxxx ()
- Date: Tue, 18 Apr 2006 19:47:39 GMT
In article <44453DE5.2759B34E@xxxxxxxxxxxx>, JF Mezei <jfmezei.spamnot@xxxxxxxxxxxx> writes:
|> Say you have a 3 node cluster, node1, node2 node3
|>
|> Each node has 1 vote. You need 2 votes to maintain quorum.
|>
|> You unplug the ethernet from node3. node1 and node2 still have quorum
|> and happily chug along. Node3 realises it has lost quorum and freezes.
|>
|> What happens if at this point, you reboot node3 ?
Nothing bad happens.
If you were to reconnect the network after Node1 and Node2 have
decided that Node3 is gone and have effectively tossed it out of
the cluster, Node3 would detect this upon the return of network
connectivity, and would CLUEXIT and reboot itself. Once Node3 has
rebooted, data synchronization would be re-established and fully
available, and Node3 could fully rejoin the cluster.
|> Node3 would not see 1 or 2 and thing it was the first node rebooting
|> after say a power failure. Right ?
It could. But nothing bad happens.
|> Wouldn't NODE3 then form its instance of the cluster with its own vote
|> and a quorum of 1 ?
Only if the core cluster settings have been misconfigured.
If you would like to learn how to properly set the VOTES and the
EXPECTED_VOTES parameters, do please consider reviewing the text on
these settings in the OpenVMS FAQ.
|> Or is there anything which would prevent node3 from rebooting ?
It will reboot, but will hang awaiting quorum.
|>
|> Once node3 has rebooted, thinking it was alone in the cluster, when
|> happens when you plug the ethernet back in and all of a sudden, node3
|> sees nodes 1 and 2 ? Will nodes 1 and 2 succeed in convinding node3 to
|> commit suicide ? Will node3 convince nodes1 and 2 to commit suicide
|> (since it is a more recent incarnation of the cluster) ?
It won't get that far.
|> Or will what is essentially a partitioned cluster continue to exist as 2
|> separate cluster instances ?
Barring bogus parameter settings, OpenVMS Cluster software handles
this case correctly.
|> If node3 is i fact allowed to reboot all alone in its own little world,
|> it seems to me that shops should have, as part of their operator
|> documentstions, strict guidelines not to reboot machines that are
|> frozen. (sort of counter intuitive in a windows environment) since you'd
|> want to first fix/investigate the problem of lost quorum.
A correctly configured cluster correctly deals with this case.
On at least a few occasions, I have mentioned that attempting to
bypass the quorum scheme -- whether through incorrect or "creative"
parameter settings -- can lead to badness, and one of the central
cases of this "badness" involves the inability to nodes to establish
complete cluster connectivity as a node bootstraps. Within this
environment, incorrect or "creative" VOTES and EXPECTED_VOTES
parameter settings can quickly lead to massive disk corruptions,
and these corruptions can arise long before you could ever log
into the system to repair the settings. Once connections have
been established, OpenVMS corrects this case. (We've discssed
ways of notifying the system manager about the more bogus of these
settings, but that's not presently an option.) Again, please see
the OpenVMS FAQ for details on VOTES and EXPECTED_VOTES.
I'm certain you are familiar with the following two URLs, but I will
include them here for completeness:
<http://www.hp.com/go/openvms/doc/>
<http://www.hp.com/go/openvms/faq/>
.
- Follow-Ups:
- Re: Partitioned cluster question (reboot during lost quorum)
- From: JF Mezei
- Re: Partitioned cluster question (reboot during lost quorum)
- From: Ian Miller
- Re: Partitioned cluster question (reboot during lost quorum)
- References:
- Partitioned cluster question (reboot during lost quorum)
- From: JF Mezei
- Partitioned cluster question (reboot during lost quorum)
- Prev by Date: Re: Free rz26s and rz28s
- Next by Date: Re: SoyMail & insufficient privilege
- Previous by thread: Partitioned cluster question (reboot during lost quorum)
- Next by thread: Re: Partitioned cluster question (reboot during lost quorum)
- Index(es):
Relevant Pages
|