Boot Error on Shadowed Sys Disk.



I currently run a 3-node Prod Cluster (2 x GS1280 (M16), and 1
ES40). I am running OpenVMS 7.3-2, fully patched (as far as my
vendor has certified), TCPIP Services V5.4, ECO 4. The storage
subsystems are EVA5000, and the system disk is in the SAN. I have
been running in this configuration for over 18 months without any
issues.
On Tuesday last I was trying to replace one of the GS1280(M16)'s
with a new GS1280(M32). My planned procedure was
1. Shutdown old M16
2. Give all of the old connections to the new M32 (Interconnect, SAN,
Network).
3. Boot the new M32 from the same root as was used by the old M16.
i.e., the new system will take over the Name, DECNet address, IP
address, in other words "the complete identity" of the old system.
Note: The additional SMP licences etc. were pre-loaded. Nodes 2
and 3 remained up and clustered the whole time.

Events.
Shutdown old M16 OK.
Connections transfered OK
M32 booted OK
Logged in, reconfigured NIC for TCPIP and DECNet, (different Device and
Interface names on new system)
Carried out several boots without issue.

Problem:

On about the 5th or 6th boot, the new system joined the cluster ok, but
then hung. Checking the systems which were up, I noticed that the
system disk was in "Mount Verify". Eventually the booting system
"timed out (?)" and bugchecked, giving a message, the gist of which was
"Unable to boot from shadowed system disk". After the system
bugchecked, the other systems showed the system disk as back to normal.
(Because the new system is 96GB mem, and because of time constraints, I
couldn't wait for a crash dump to write so I Cntrl_P'd out of it).
I INIT'ed the system and tried to boot again, with the same results,
Halted, INITed again, and tried a third time, with the same result.

At this point, I decided that I didn't have time for further trouble
shooting and decided to rollback. I was concerned about this
because the problem looked to be software, particularly, a problem with
this specific root on the system pack. I was concerned that the old
M16 might also fail to boot.

The old system booted up without any problems however. (to my immense
relief).

Can anyone shed any light on what might have caused this series of
events.

A possible answer occured to me this morning while driving in to work.
It is this;

1. Because of other work which was going on, I had (early in the PM
window) reduced all of my shadowsets to a single member (one of the
EVA's was going to be disconnected from the SAN to be worked on.)
THIS INCLUDED one unit on THE SYSTEM DISK.
2. At some point in the procedings, the EVA was brought back into
the SAN, and at this point I ran a script on Node 3 to remount all of
the missing units and resync (all using mini-copy except the system
disk). The full copy on the system disk would have taken ~30mins.

I cannot remember the specific time that I ran the REMOUNT script, or
if it coincided with my getting these problems. HOWEVER:

IS IT POSSIBLE THAT MY PROBLEM WAS CAUSED BY THE SYSTEM DISK
SHADOWCOPYING.

This might explain why I had no problems with the old hardware, vis. by
the time I swapped all of the connections back, the shadow-copy had
finished.

I would appreciate any comments on this issue

Thanks, Dave.

.



Relevant Pages

  • Re: VAX 11/730 SABACKUP questions
    ... I know how to boot SABACKUP from a TU58 or RL02, but I don't have a machine that I can use to generate media for them. ... The only thing I cannot remember is whether SA BACKUP was on a separate tape. ... Generally this means a TU58 set or a copy installed on the system disk. ...
    (comp.os.vms)
  • Problem with CLUSTER_CONFIG? (was: Re: DIFVOLMNT (%X0072832C), then bugcheck whenbooting any node.)
    ... I reran CLUSTER_CONFIG up until the point were you'd normally boot the ... recall that the new system at first had a bad Ethernet connection ... saw no other cluster nodes, and it formed a VAXcluster all by itself. ... worth the possibility of corrupting the system disk in a scenario like ...
    (comp.os.vms)
  • Re: Upgrading VMS on a system where hardware is newer than the system disk OS supports
    ... > a test run due to lack of hardware. ... We won't try to boot the DS15 on the ... > V7.2-1 system disk. ... > version not supported on the hardware being used for the upgrade, ...
    (comp.os.vms)
  • RE: Unable to load WinXP sp2
    ... when windows starts up it shows a number of messages ... and options upto the time the Windows xp screen shows. ... > why is computer set to boot to cd? ... > What happens when you try to boot with "system disk"? ...
    (microsoft.public.windowsxp.help_and_support)