DIFVOLMNT (%X0072832C), then bugcheck whenbooting any node.

From: Galen (gspamtackett_at_yahoo.com)
Date: 10/03/03


Date: 3 Oct 2003 04:47:26 -0700

We've gotten into this situation with our cluster twice recently. (I'm
not referring to a VOLALRMNT error, which is a different numerical
status.)

Configuration is:

A single OpenVMS Alpha V7.3-1 system disk which is very current on
patches.
System disk lives on an HSG80, reached via a SAN core switch.
Satellites do not have any shared-storage connections (i.e. no DSSI,
no FibreChannel, no shared SCSI).
13 boot servers and 7 satellites, all Alphas.
Running Storageworks RAID software (not sure how relevant).

In both cases, we had recently run CLUSTER_CONFIG to add a new server
node. However, in each case, the new node had no physical LAN
connection (fiber not hooked up) and took a CLUEXIT bugcheck after a
few minutes.

Each time, shortly after the CLUEXIT, we got the node's LAN connection
working and re-ran CLUSTER_CONFIG. Just after the new node reached the
point where it reports there's no pagefile on the system disk
(%SYSINIT-I-PAGEFILE), it reported:

%SYSINIT-E-Error mounting system device, status = 0072832C

We checked these things:

* No other clusters with same cluster ID (we only have one other
cluster)
* All systems have VAXCLUSTER set to 2.
* The volume label on the system disk has not been changed since the
cluster was last booted.

The only solution we've found is to reboot the cluster (not a pleasant
option, of course).

But we're just as concerned to find out what's causing this. I suspect
that the CLUEXIT during CLUSTER_CONFIG somehow is involved but have
only a little circumstantial evidence, as described here.

HP software support and the maintainer of the MOUNT code have given us
a little script to periodically check the volume's SCB and report if
its checksum changes. Beyond that, they're out of ideas right now.

(FYI, the bad connections occur because our fiber cable plant is very
badly documented, has a lot of old labels, and some of the fibers have
been damaged at one time or another. But this is another issue.)

Thanks for any help or suggestions,

Galen



Relevant Pages

  • Problem with CLUSTER_CONFIG? (was: Re: DIFVOLMNT (%X0072832C), then bugcheck whenbooting any node.)
    ... I reran CLUSTER_CONFIG up until the point were you'd normally boot the ... recall that the new system at first had a bad Ethernet connection ... saw no other cluster nodes, and it formed a VAXcluster all by itself. ... worth the possibility of corrupting the system disk in a scenario like ...
    (comp.os.vms)
  • Re: Clustering on VMS 4.7
    ... They all tell me "This system disk is not set up ... disk a cluster system disk, can anyone tell me what the first step is? ... The 4.7 kit didn't really install VMS. ...
    (comp.os.vms)
  • Re: VMS analogue of FBSD and linux hier(7) man pages
    ... standalone system. ... SYS1-SYSC are optional on a VMS Cluster system disk; one root for each system in the cluster. ... I don't believe I've ever seen more than three systems booting from the same system disk although, in principle, you could have as many as thirteen. ... there were 33 or so system roots on the VAX system disk. ...
    (comp.os.vms)
  • Re: Clustering on VMS 4.7
    ... They all tell me "This system disk is not set ... disk a cluster system disk, can anyone tell me what the first step is? ... Note that anything below V5.5-2 is not going to support Y2K! ... Tekst uit oorspronkelijk bericht weergeven - ...
    (comp.os.vms)
  • Re: SECURITY.AUDIT$JOURNAL and ERRLOG.SYS
    ... SYS$COMMON in practice. ... This points to what is in some respect a weakness of the cluster ... nodes booting off the same system disk. ... There is also stuff common to ...
    (comp.os.vms)