Re: Partitioned cluster question (reboot during lost quorum)
- From: Hoff Hoffman <hoff-remove-this@xxxxxx>
- Date: Wed, 19 Apr 2006 17:17:48 GMT
If you would prefer it, I and/or other engineers here at HP can certainly subcontract for your coding and your cluster management tasks here -- HP does offer that and other similar consulting services, of course.
JF Mezei wrote:
hoffman@xxxxxxxxxxxxxxxxxxxx wrote:If you would like to learn how to properly set the VOTES and the
EXPECTED_VOTES parameters, do please consider reviewing the text on
these settings in the OpenVMS FAQ.
It will reboot, but will hang awaiting quorum.
MC SYSGEN HELP SYS EXPECTED_VOTES mentions the MAXIMUM number of
expected votes, not the minimum one.
The FAQ and the manuals also make the same point, and my previous posting around the "creative" settings was intended to dissuade you from any such "creative" -- read: wrong, dangerous, and potentially capable of triggering massive disk data corruptions -- parameter settings.
So after a power failure, if you power up your nodes one by one, you
mean to say that each node would just get get stuck near the "waiting to
form or join cluster" until it finds enough votes lying around the
ethernet to get to a quorum based on the maximum expected votes ?
Correct. As part of this processing, connectivity is established, and partitioning is avoided.
I was under the impression that when node boots and is all alone (eg:
forms a new cluster), it proceeds with a quorum of its own votes/2 + 1,
and then the quorum is dynamically adjusted updwards as other nodes join
the cluster.
"Being under the impression" can be a somewhat hazardous condition if the impression turns out to be incorrect, and the documentation and the FAQ can help ameliorate that situation.
So EXPECTED_VOTES / 2 + 1 defines the MINIMUM number of votes necessary
for any node to proceed with booting beyond the awaiting to form/join
cluster ?
No, EXPECTED_VOTES is the number of VOTES expected in the cluster, whether from voting nodes or from a quorum disk. (And a quorum disk that is not directly accessible through multiple paths and/or through multiple hosts is far from optimal.)
From this EXPECTED_VOTES value, the initial value of the cluster quorum is determined.
Information on recommended VOTES settings for particular cluster configurations has been discussed before here in the 'groups, and that discussion is certainly fodder for an update to the FAQ. There are certainly discussions of that topic within the existing manuals, as well. The discussions all tend to devolve into discussions of the connections available among the hosts and (when a quorum disk is configured) the disk storage, and the particular combination of systems that should be kept operational for the longest as the availability of the particular cluster configuration degrades.
Perhaps the documentation should say it this way instead of talking
about MAXIMUM number of votes to be expected.
Some folks thought they could re-jigger the old QUORUM setting -- which is also where you appear headed here -- so the documentation was simplified and the configuration was simplified and the whole discussion was removed. There was also some "hardening" around "creative" system parameter settings implemented, and work and discussions have continued to occur in the area of "hardening" the cluster configurations.
I once tried to ajuset expected_votes downwards with DCL (SET
CLUSTER/EXPECTED_VOTES and that didn't work. Does this mean that SET
CLUSTER/EXPECTED_VOTES will not let you being the number below the
SYSGEN parameter EXPECTED_VOTES ? but If that number does grow beyond
it, you can then use the DCL command to bring it back down t the SYSGEN
value ?
Clustering works very hard to prevent you from corrupting your data. The usual means to adjusting the quorum downward is to remove the node, and to then issue the SET CLUSTER/EXPECTED_VOTES command, or to use the SHUTDOWN.COM option, or to use the IPC (IPLC) quorum handler. Clustering goes out of its way to avoid allowing the system manager to corrupt the data.
(I tried to use this while in the process of reconfiguring the cluster
when I knew that leaving NODE X up and running would not jeoperdize
integrity but it didn'T have anough votes to keep on working so I ended
up without service with the other node rebooted).
And clustering here specifically prevented you from your attempted "creativity", as it was able to detect and prevent the configuration error. Where "creative" settings really cause problems is when there are communications problems -- in that case, incorrect system parameter settings cannot be detected, and the disk data can become corrupted.
The OpenVMS documentation and the OpenVMS FAQ are available at:
<http://www.hp.com/go/openvms/doc/>
<http://www.hp.com/go/openvms/faq/>
.
- Follow-Ups:
- Re: Partitioned cluster question (reboot during lost quorum)
- From: JF Mezei
- Re: Partitioned cluster question (reboot during lost quorum)
- References:
- Prev by Date: Re: F$getqui
- Next by Date: Re: Run Program with Shareable Image in Non-Default Location
- Previous by thread: Re: Partitioned cluster question (reboot during lost quorum)
- Next by thread: Re: Partitioned cluster question (reboot during lost quorum)
- Index(es):
Relevant Pages
|