Re: Partitioned cluster question (reboot during lost quorum)
- From: JF Mezei <jfmezei.spamnot@xxxxxxxxxxxx>
- Date: Wed, 19 Apr 2006 21:02:54 -0400
Hoff Hoffman wrote:
If you want to boot outside the cluster, then I tend to prefer to
avoid enabling NISCS_LOAD_PEA0 and I don't load VAXCLUSTER.
No, the idea is to be able to boot one node first before you bring in
the other ones, so that first one has to form the cluster and the others
join in later.
Consider the last time this has happened, too. I've been at this for
over a decade, and I can't recall ever having run into this case.
What I've come to realise is that it is very hard to predict every
possible problem/situation.
The ability to find solutions out of some unpredicted situation requires
good understanding of the clustering process and full understanding your
configuration/applications so that you can "cheat" the config without
jeoperdizing data integrity.
VMS cannot make any assumptions when it loses connection with another
node. It could be ethernet down, it could be the other node down. VMS
(rightly) protects against worse case scenario. But the system manager
can obtain additional information (such as confirmation that the others
nodes are in fact powered down) which then allows him to gauge the
situation and consider that the first node can be up without
jeoperdizing data integrity, at which point knowledge on how to "cheat"
the predefined rules comes in handy to get out of that situation. (and
later restore the safe settings).
As a rule, the node with the valid copy will have the highest instantiation,
and will be the source for the shadow copy operation,
Your "as a rule" applies well when all disks are local and if you have
access to one disk, you have access to all disks. (aka, you always mount
all members of a shadowset together in the same mount command, so the
shadowing software can make the proper determination of which physical
disk has the more recent valid copy.
But when shadowset members appear gradually as nodes boot, then the
order in which you boot the nodes becomes critical to ensuring the right
physical drive is first mounted into the shadowset. Knowledge of which
physical drive has the valid contents comes from knowing the config, the
exact state of the cluster before the mishap and its current state. AKA:
full situation awareness. And it isn't something which VMS can ascertain
by itself. It takes a human to diagnose the situation.
So knowing how expected_votes works, means you can cheat the default
config to allow one node to boot first when you know this is what has to
be done.
.
- Follow-Ups:
- Re: Partitioned cluster question (reboot during lost quorum)
- From: Ken Fairfield
- Re: Partitioned cluster question (reboot during lost quorum)
- From: Hoff Hoffman
- Re: Partitioned cluster question (reboot during lost quorum)
- References:
- Partitioned cluster question (reboot during lost quorum)
- From: JF Mezei
- Re: Partitioned cluster question (reboot during lost quorum)
- From:
- Re: Partitioned cluster question (reboot during lost quorum)
- From: JF Mezei
- Re: Partitioned cluster question (reboot during lost quorum)
- From: Hoff Hoffman
- Re: Partitioned cluster question (reboot during lost quorum)
- From: JF Mezei
- Re: Partitioned cluster question (reboot during lost quorum)
- From: Hoff Hoffman
- Partitioned cluster question (reboot during lost quorum)
- Prev by Date: Re: OT: Sparc not dead yet
- Next by Date: Re: new DST routines for C-RTL
- Previous by thread: Re: Partitioned cluster question (reboot during lost quorum)
- Next by thread: Re: Partitioned cluster question (reboot during lost quorum)
- Index(es):
Relevant Pages
|