Re: shadowing questions



Phillip Helbig---remove CLOTHES to reply wrote:
Assume that each shadow set consists of just two members, each of which has a local connection to only one machine---except in the case of system-disk shadow sets, the disks are connected to different nodes.

Suppose a node starts a full shadow copy on a shadow set the disks of
which are connected to two other nodes in the cluster, but then the
first node crashes. Will the copy be picked up, where it was left off, by another node? Or will the entire copy start anew?

If there is another node in the cluster which has the shadowset mounted, and that node still has access to the disks, I would expect that node to pick up the full-copy at the point where it was left off.


Suppose a full copy is going on, but the node with the copy source becomes unavailable. Will the entire shadow set go into mount verification, picking up the copy where it was left off when the node, and the disk which is connected to it, become available again? Or will the entire copy start anew?

Assuming the copy source is the only source member in the shadowset, the entire virtual unit will go into mount verification. It it comes back soon enough (probably before MVTIMEOUT seconds) I would expect the mount verification to succeed and the copy to proceed from where it left off.


I have SHADOW_MAX_COPY=1 in MODPARAMS.DAT on all machines (a higher value doesn't make sense since all communication is on the LAN). When a VAX boots, it looks for a special logical in the cluster table. If this is defined, then via SYSGEN SHADOW_MAX_COPY is set to 0. The operator can then do a MOUNT from an ALPHA to enable a MINICOPY to a disk which went down with the VAX which just rebooted. (This works even if the shadow set has no members with a direct connection to an ALPHA.) After this is done, SHADOW_MAX_COPY gets reset to 1. (I don't like to have it set to 0 by default: When the VAX boots, its own system-disk shadow set might need a copy or merge, and it is best that the VAX itself do it. Also, for unplanned shadow copies, where no MINICOPY is possible, it is fine (or even preferable) for the copy to be done by a VAX.)

I would be fine, and since the VAX has a direct connection to the target disk, I would think it would also be preferable for it to do the full-copy operation.


My third question is whether such logic makes sense for an ALPHA as
well. In other words, suppose an ALPHA is to be rebooted, and I do a DISMOUNT/POLICY=MINICOPY on the disks connected to it from another ALPHA in the cluster. When this ALPHA reboots, I obviously don't want a VAX to pick up the copy, since then no MINICOPY is possible. (Thus, when the special logical is defined at DISMOUNT, I also set SHADOW_MAX_COPY to 0 on all VAX nodes via SYSMAN.) However, what about the booting ALPHA itself. Is it capable of performing a MINICOPY to a disk connected (only) to it (but, of course, MSCP served to all other nodes),
or does this have to be done from another ALPHA. (I'm wondering whether the write bitmaps propagate to a node after it boots, like, for example, cluster-wide logicals.) If so, then I shouldn't set SHADOW_MAX_COPY to 0 on the ALPHAs when booting, so that the booting ALPHA has the possibility to pick up the minicopy (which might be the preferred scenario). If not, then I should use the same code as on the VAXes.

Bit-maps aren't copied between nodes, so the booting Alpha couldn't do the mini-copy, as it would have no bit-map; another Alpha with a write bitmap would have to do it.


Is there some way to allow a MINICOPY in the case of unplanned node crashes? I'm thinking of specifying /POLICY=MINICOPY on the MOUNT command (I think this can be done now, but I believe it just makes this the default when a DISMOUNT command occurs) so that a write bitmap is created then. When a node crashes, then a MINICOPY instead of a full copy could be done when the copy target is available again. (Obviously, this would only make sense for a shadow set on which a relatively small portion of the contents of the disk will have changed between the initial MOUNT and the MOUNT which occurs after the disk has disappeared and become available again.)

You can have up to 6 bitmaps per shadowset, IIRC. If you have some extra disks, the Shadowing developer points out you can do some creative things to speed recovery after an unplanned outage. Let's say you have one "production" disk locally connected per node, and an extra "recovery" disk. Assuming the shadowset were in steady-state with 2 members, you could add a "recovery" disk in and let the full-copy finish. Then you could remove it forming a mini-copy bitmap on the "opposite" node. Next, add the other "recovery" disk in and allow its full-copy to complete, then remove it with a mini-copy bitmap on its opposite node. Now you have two members in the shadowset, and a bitmap on each node tracking all the changes relative to the "recovery" disk on the opposite node. If either node goes away unexpectedly, you can initiate a mini-copy to the "recovery" disk on the opposite node to restore redundancy quickly, then add the "production" disk in with a full-copy afterward to get back to normal (and it's OK for that to take longer because you already have restored redundancy).


The Shadowing developer is also looking into the possibility that Shadowing may be able to, at the point where a member has to be removed, convert a mini-merge bitmap to a mini-copy bitmap for that removed member, and thus track all the changes subsequent to its loss, and allow a mini-copy operation to reintegrate it later. This would be particularly handy to allow mini-copies in disaster-tolerant clusters after a failure which results in downtime of either one site or of the inter-site link.
.




Relevant Pages

  • Re: physical drive replacement
    ... >> My first problem is determining which of the 2 physical drives in stripeset ... DELETE the disk, quiesce the bus and perform the physical swap, ... > and mount it into the shadow set, which should trigger a shadow copy ...
    (comp.os.vms)
  • Re: What is the Difference between Shadow and Mirrored disk?
    ... > Is a Shadow disk in OpenVMS the same as a Mirrored disk in RAID for ... A VMS shadow set is a software based solution. ... A RAID ...
    (comp.os.vms)
  • Re: MinVMS or CD backup and volume shadowing
    ... shadow set with two data disks for the first time. ... specifically using MinVMS or the ... running from a write-locked disk. ...
    (comp.os.vms)
  • Re: Implementing a RAID 1 Array
    ... For your RAID 1 drive is hardware RAID 1, you will not see the mirrored ... drive in the Disk Management console. ... The Volume Shadow Copy Service provides the backup infrastructure for the ...
    (microsoft.public.windows.server.sbs)
  • Shadow or Raid. - Dont do both! WAS Re: 306GB drives!
    ... >> from all the shadow copy merges. ... It froze the whole cluster instead of autosparing the disk! ... I do not bother with notification for when the drives have normalized. ... > individual disks (or even smaller partitions of a disk). ...
    (comp.os.vms)