Re: 2-node LAVC cluster with quorum disk - network disappears - which node CLUEXITs ?

From: Nic Clews (sendspamhere_at_[127.0.0.1)
Date: 03/01/04


Date: Mon, 01 Mar 2004 15:42:15 +0000

PhilThayer wrote:
>

> One way you might be able to dictate which system would stay up in is
> be setting LOCKDIRWT higher on one system. That way the system with
> the lower LOCKDIRWT is more likely to CLUEXIT first.

What? Why? How is that supposed to work?

LOCKDIRWT is LOCK DIRectory manager WeighT. It is a parameter used to
determine the proportion of the distributed lock database that _this_
node will act as the directory manager, in relation to the other members
of the cluster. It forms a deterministic way of a cluster member working
out which node will in the first instance be taking care of the
distributed lock in question. The directory is a LOOKUP operation.

The only instance I can think of this being observed is a satellite node
with a LOCKDIRWT of 0 and a network interconnect, and reconfiguration
determines that it as a non voter is on the "losing" side (0 votes and
lockdirwt of 0 is typical for a standard configuration).

As mentioned, I've tried to explain this before. You really have to
understand that the system in a cluster has to throw away any
preconception of how it may or can connect to any other members of the
cluster. Each and every member forms their own picture of the cluster,
and in fact multiple optimal sub clusters (actually discounting some of
the members they could include), and this information is exchanged
between all the members, and when all the members agree they have the
same ("most important") members, reconfiguration completes, and any
"rejected" or not included members will gracefully CLUEXIT. Other people
have mentioned more votes, higher SCSSYSTEMID's, I didn't see mention of
software version which also plays a part (higher is more), the
incarnation time also has influence, then timing (or chance) comes into
play.

But LOCKDIRWT? No. Probably one of the most misunderstood cluster
parameters it is.

-- 
Regards, Nic Clews a.k.a. Mr. CP Charges, CSC Computer Sciences
nclews at csc dot com


Relevant Pages

  • Re: Clustering: switches reliability/redundancy
    ... If NIC A-1 fails, everything still works. ... Each node has a list of legal subclusters, and the general principle is that the transaction coordinator in the cluster, finds agreement on a "survivable" "largest" subcluster with quorum, and your unlucky nodewill be CLUEXITed. ... The general idea is that the list comprises all members, then fewer and fewer members, and at each coordinating step if there is a match of those few members that can maintain quorum, then reconfiguration completes. ...
    (comp.os.vms)
  • Re: 2 Node Network Load Balance - Duplicate NetBIOS Name
    ... you would still need some method of synching between ... There is no "split" write that writes to all members of an NLB ... a Network Load Balanced cluster. ...
    (microsoft.public.windows.server.clustering)
  • Problem with login to Cluster
    ... Last unsuccessful login: Fri Jun 17 17:12:05 from ds04 ... Due to i logged in to all members before, ... to shutdown the whole cluster but will it really ... Harald Baumgartner Max-Planck-Inst. ...
    (Tru64-UNIX-Managers)
  • problems during and after cluster dupatch from 5.1B to 5.1B-2
    ... before dupatch ... Do you think I'm running a properly patched 5.1B-2 cluster or not? ... the two members with reguard with the newly installed kits. ... It then installed some other patches and finally it wrote: ...
    (Tru64-UNIX-Managers)