Re: Interesting cluster config "deadlock"



I know some full production environments that have been like this for
many months (years?)
I managed an environment where a VAX with locally attached DSSI disks
was clustered with a pair of turbolasers with shared SCSI. The VAX
needed stuff from the Alphas to boot and the Alphas needed stuff from
the VAX.
We also needed to retain cluster quorum.
Ultimate answer was to bring up the one Alpha with very little
starting. Then bring up the VAX and the other Alpha, then reboot the
first Alpha.
Messy, but it worked.

Steve

JF Mezei wrote:
The local transformer blew its fuse on a very cold winter day. I was
litterally powerless to keep my systems running.

Upon rebooting, I found myself in an interesting situation. Being in the
(slow) process of moving stuff and restructuring my cluster, I found out my
cluster had been left in a precarious state !

SYSUAF (et all) was still on a node1 disk. User disk is on node2, but node2
boots off node3.

node1 was still in charge of defining certain clusterwide logicals pointing
to disks now served by node2. So when node1 booted, those disks were not
available and the logicals were missing a device name :-)

Amd because of cluster quorum, I could not sequentially boot the nodes in
the right order. Node1 had to wait for enough other nodes to boot befor
continuing its boot process. And once enough votes were present, the order
of booting was dictated by system speed.

In the end, I managed to get it all up, but it required reoots of some
machines once the other machines were up and able to serve the good disks.

This is something I had not considered before.

So now there is a bit more pressure on my derrière to complete my cluster
reconfig and make it robust enough to be able to recover fully
automatically from a power failure.

Just something to keep in mind when moving stuff around in a cluster.

.



Relevant Pages

  • changed WWID on cluster member boot disk
    ... single-member cluster; the second member has not yet been added to ... The disks containing the cluster root, ... but an attempt to boot the DS20E as a single-member cluster failed; ... the boot of the stand-alone system, a number of new special device files ...
    (Tru64-UNIX-Managers)
  • v880 internal array death
    ... I have a 4 node cluster of v880's that refuses to gracefully accept ... The six internal disks on the 880 are used for booting the system only. ... Root is encaplusated and mirrored on disks 0 and 1 and the system can boot ... root@DT5AE1:/:# luxadm display FCloop ...
    (SunManagers)
  • Booting a poor-mans LAVC [was:Interesting cluster config "deadlock"]
    ... I managed an environment where a VAX with locally attached DSSI disks ... We also needed to retain cluster quorum. ... then tried to mount the shadow set with ...
    (comp.os.vms)
  • Re: Cross-architecture booting
    ... boot of the VAX will work, ... Seems that a VAX wants to have SYS$LIBRARY:NISCS_LOAD.EXE as the first ... You could also manually add the node to the cluster. ...
    (comp.os.vms)
  • a small problem
    ... disks in and expansion box,and a diskless vaxstation3100.I installed all software in one of the rz26,and made a cluster,everything was ok. ... with disk server.The vax boots ok.No hardware error visible anywhere ..All was done with the last hobby cdrom I have with vms7.2.What is the problem please?What did I forget to do? ... A last test:without changing,moving even touching nothing:boot from scsi is ok.thanks for your help. ...
    (comp.os.vms)