Re: Partitioned cluster question (reboot during lost quorum)
- From: helbig@xxxxxxxxxxxxxxxxxxxxxxxx (Phillip Helbig---remove CLOTHES to reply)
- Date: Wed, 19 Apr 2006 12:05:37 +0000 (UTC)
In article <44460193.24E758F7@xxxxxxxxxxxxx>, JF Mezei
<jfmezei.spamnot@xxxxxxxxxxxxx> writes:
Michael Moroney wrote:
Set it to a value that is
equal to the sum of the vote parameters of all cluster members,
plus any votes that are contributed by the quorum disk.
The HELP for EXPECTED_VOTES does not mention that EXPECTED_VOTES/2+1 is
the minimum number of vote needed before a node can proceed with the
boot. That is the real important portion here.
HELP can't say everything. OK, it could say more in some cases.
Looking at the 7.1 documentation (since I have that on CD in a drive on
a VAX 4000 and it is in BOOKREADER format and I'm sitting at my trusty
21-inch DEC monitor), it is in Chapter 2.3.5:
2.3.5 Calculating Cluster Votes
The quorum algorithm operates as follows:
Step Action
1 When nodes in the OpenVMS Cluster boot, the connection manager uses
the largest value for EXPECTED_VOTES of all systems present to derive
an estimated quorum value according to the following formula:
Estimated quorum = (EXPECTED_VOTES + 2)/2 | Rounded down
2 During a state transition, the connection manager dynamically computes
the cluster quorum value to be the maximum of the following:
· The current cluster quorum value
· The largest of the values calculated from the following formula,
where the EXPECTED_VOTES value is largest value specified by
any node in the cluster:
QUORUM = (EXPECTED_VOTES + 2)/2 | Rounded down
· The value calculated from the following formula, where the VOTES
system parameter is the total votes held by all cluster members:
QUORUM = (VOTES + 2)/2 | Rounded down
3 The connection manager compares the cluster votes value to the cluster
quorum value and determines what action to take based on the following
conditions:
WHEN... THEN...
The total number of cluster The OpenVMS Cluster system continues
votes is equal to at least the running.
quorum value
The current number of clus- The remaining OpenVMS Cluster mem-
ter votes drops below the bers suspend all process activity and all
quorum value (because I/O operations to cluster-accessible disks
of computers leaving the and tapes until sufficient votes are added
cluster) (that is, enough computers have joined
the OpenVMS Cluster) to bring the total
number of votes to a value greater than or
equal to quorum.
Note: When a node leaves the OpenVMS Cluster system,
the connection manager does not decrease the cluster quo-
rum value. In fact, the connection manager never decreases
the cluster quorum value; it only increases it. However,
system managers can decrease the value according to the
instructions in Section 8.6.2.
Once a cluster is up and running, does the SYSGEN parameter
EXPECTED_VOTES matter ?
It doesn't matter what it's value in MODPARAMS.DAT is. As the above
says, the connection manager can increase it if necessary. However, it
is not listed as a dynamic parameter. I THINK it is used only at
startup; once the cluster is formed, VOTES is what matters. You can
start out with three nodes each with 1 vote, then add members (each with
one vote) until you get to 96 nodes. Quorum will then be 49, so you
can't remove more than 47 nodes without adjusting EXPECTED_VOTES by
hand.
What happens when 3 nodes in a cluster all have different EXPECTED_VOTES
SYSGEN values ? Wouldn't the cluster software automatically calculate
the right value for expected votes ?
See above; it takes the largest.
I just checked on 3 nodes: 2 have "5" as EXPECTED VOTE and the
workstation as "3". So obviously I have to review that.
That means quorum is 3. However, if you normally have 3 nodes in your
cluster, you would normally want quorum to be 2.
Set EXPECTED_VOTES to just that, the number of votes expected when the
cluster is up and running under normal conditions, in MODPARAMS.DAT and
run AUTOGEN through SETPARAMS. Change it when you change the number of
nodes in your cluster under NORMAL conditions.
1- once the cluster is up, the value of the QUORUM was calculated from
the maximum number of votes that were seen since the cluster was formed.
(and each node contributing a fixed number of votes read from SYSGEN.
Unless the maximum number of EXPECTED_VOTES is higher.
Guess what REMOVE_NODE does ? it does a SET CLU/EXPECTED_VOTES (or at
least, that is what I remember I saw). And in my case, when SET
CLU/EXPECTED didn't work, I did the REMOVE_NODE and the remining VAX
still froze due to lack of quorum.
This is strange. However, note that with a 3-node cluster, with quorum
of 2, there is no need for REMOVE_NODE, since in a 2-node cluster quorum
is also 2 (with 1 vote per node). It makes sense if you normally have,
say, 4 nodes with quoruum of 3. You can afford to lose 1. However, if
you take one down for maintenance, it would be good to use REMOVE_NODE,
since quorum will now be 2, and you could afford to lose another node.
.
- References:
- Prev by Date: new DST routines for C-RTL
- Next by Date: Re: What (WAS) wrong with my network?
- Previous by thread: Re: Partitioned cluster question (reboot during lost quorum)
- Next by thread: Re: Partitioned cluster question (reboot during lost quorum)
- Index(es):
Relevant Pages
|