shadow sets, cluster, merge, MVTIMEOUT, dismount



On my hobbyist cluster, I have only SCSI disks. Each disk has a direct
connection to only one node. All disks are served to all nodes. The
few non-shadowed disks are mounted on all nodes. Most disks are part of
two-member shadow sets. In the case of system disks, both members have
a direct connection to the same node. In other cases, the members have
direct connections to different nodes. All shadow sets are mounted by
all nodes.

If I read the somewhat cryptic code correctly, SYS$SYSTEM:SHUTDOWN.COM
dismounts all mounted disks on the node to be shut down, whether they
are individual disks or shadow sets and, in the latter case, wherever
the members are. If REMOVE_NODE is specified, then it dismounts disks
clusterwide if a) they are not shadow sets and b) the node to be shut
down hosts the disk.

I want to avoid unnecessary shadow merges. It is clear that if a node
has files open on a shadow set and this shadow set disappears, either
because a) the node reboots or b) the members of the shadow set become
dismounted (for example, when the nodes hosting them reboot), then a
merge will occur. I need to make sure that no open files exist during
shutdown. Normally, it's enough to shut down non-system applications,
but I've found that it is necessary to explicitly kill some system
processes, namely DECW$SERVER_0. (While not strictly necessary from the
open-files point of view, normally TCPIP is shut down during system shut
down. However, normally applications with BG devices should be shut
down before TCPIP is shut down. On the system side, there is VMP_SERVER
and FORMS_SERVER. If the system is being shut down completely, then
left-over BG devices are probably not a problem, but since they
shouldn't be hanging around if TCPIP is shut down in other situations,
and if one uses the same TCPIP-shutdown procedure, then in practice
VMP_SERVER and FORMS_SERVER will be shut down also.)

I have SYSUAF etc on a non--system-disk shadow set; DECW$SERVER_0 has
RIGHTSLIST.DAT open. (Or is there something in SHUTDOWN.COM which would
cause this process to be killed?)

Let's say A is shutting down, while B and C remain up. There are
several cases, depending on where the shadow-set members are:

1) both on A

2) both on B or C

3) one on A and one on B or C

4) on B and C

In case 1), it will be dismounted on A but not on B and C. Is it
necessary to dismount it on B and C in order to avoid a merge? Or will
a merge occur only if there are open files on the shadow set? The
documentation says:

· If a failure occurs in a cluster, the shadow set is merged
by a remaining node that has the shadow set mounted:

Does this mean that a merge will occur as soon as A is back up? (In
keeping with the logic of SHUTDOWN.COM, it would seem to me that in case
1) the shadow set should be dismounted /CLUSTER at shutdown. This would
also avoid a merge if otherwise one would occur, at least if REMOVE_NODE
is specified.)

2) and 4) pose no problems: A will dismount the shadow set and it is
otherwise unaffected.

What about 3)? Since one member will be unavailable during the reboot,
then at least if there is I/O then a copy will be needed. (If A is the
only ALPHA in the cluster, then a minicopy will not be possible.) Under
what conditions will a copy be needed:

o B or C has open files on the shadow set, but no I/O is performed

o B or C (wants to) perform(s) I/O on the shadow set

Do the answers in the above two cases depend on whether the member on A
is unavailable for shorter or longer than MVTIMEOUT? If the disk is
available again before MVTIMEOUT runs out, what happens when I/O is
attempted while the disk is away? Does it stall until the disk is back,
or does it complete to the remaining member and performed on the disk
connected to A as soon as it is available again? (Again, in keeping
with the logic of SHUTDOWN.COM, if REMOVE_NODE is specified, then it
would make sense to DISMOUNT/CLUSTER the members of shadow sets
connected to A at shutdown.)

It certainly doesn't hurt do dismount the shadow set in case 1). It
would be bad to do so (force a shadow copy) it not necessary in case 3).
On the other hand, if node A will be away for a while (REMOVE_NODE),
then it's probably a good idea.

I also want to avoid disks (either entire shadow sets or (some of) their
members) going into MVTIMEOUT such that I have to reboot something to
get back to normal.

.



Relevant Pages

  • Re: Errors during shadow set merge
    ... I have been getting errors during shadow set merges since I bought ... getting 16 errors on DKA0 and 83 errors on ... I will swap out one of the disks and give it a try. ... Then I merged the shadowset on the problem machine. ...
    (comp.os.vms)
  • Re: Errors during shadow set merge
    ... I have been getting errors during shadow set merges since I bought ... getting 16 errors on DKA0 and 83 errors on ... I will swap out one of the disks and give it a try. ... Then I merged the shadowset on the problem machine. ...
    (comp.os.vms)
  • miscellaneous puzzles
    ... When I have a shadow set (both members physically connected to node A ... all disks in the cluster MSCP-served) ... When I then DISMOUNT it on ... that the files reside in SYS$SYSROOT:isn't much help, ...
    (comp.os.vms)
  • Re: (DIS)MOUNT/POLICY=MINICOPY in mixed VAX-ALPHA cluster
    ... > member of a shadow set on it and another member on another node. ... > logical disks consist of a shadow set of two physical disks with the ... > can I just dismount a member and benefit from the fast copy when I add ...
    (comp.os.vms)
  • Re: HBVS, shutdown procedures, dismounting disks, SHADOW_MBR_TMO
    ... We will have some disks that are ... > served and available to the rest of the cluster during the reboot. ... > rebooting system and another shadow set member on another node. ... If they are on the node going down---does SHUTDOWN dismount the physical ...
    (comp.os.vms)