Re: OSR 5.0.4: Cannot boot from mirror after drive fails

From: Bela Lubkin (belal_at_sco.com)
Date: 05/25/04

  • Next message: John DuBois: "Re: only one program per login"
    Date: Tue, 25 May 2004 21:22:19 GMT
    To: scomsc@xenitec.ca
    
    

    Jay wrote:

    > Calling all Virtual Disk Manager experts!

    Not me, but I can help a little...

    > System:
    > OpenServer 5.0.4
    >
    > Problem:
    > - System set up with mirrored boot, root and swap on 2 x 18GB SCSI
    > Ultra160 drives using SCO Virtual Disk Manager.
    >
    > - Hard disk 0 fails completely, system outputs massive amounts of
    > "Unrecoverable error reading SCSI disk 0 ....medium error" messages to
    > /dev/tty01 and then disk starts making some terrible noises.
    >
    > - System becomes unresponsive and has to be powered down.
    >
    > - Tried to boot system into single user mode so that VDM can be used to
    > disable the disk mirror prior to removing faulty drive and booting off
    > the parity disk. Error messages "WARNING: vdisk1: timestamps not closed
    > properly parity out of date".
    >
    > - Just as single user mode command prompt comes up, console is filled
    > with more medium error messages. Impossible to type anything so have to
    > power down.
    >
    > - Removed faulty disk, set SCSI ID of parity drive to be ID 0 (which is
    > the ID of the faulty unit), power up machine again and get lots of
    > "WARNING: vdisk1: failed to open piece 2, piece 2 is out of service,
    > piece 2 being taken off-line" messages. Only option is to dump to disk,
    > then power off / reboot.
    >
    > - Tried to boot system using "Boot: hd(40)unix root=hd(42) swap=hd(41)"
    > instead of the default boot string that uses vdisk 1,2 and 3, and
    > manage to get into single user mode.
    >
    > From here I am stuck. I can open Virtual disk manager through SCOADMIN,
    > but I can't unmirror / rebuild as per the instructions. This is because
    > the system cannot mount /dev/boot on /stand and modify the boot file.
    >
    > I have a replacement disk which I have fdisk'd and divvy'd to the exact
    > values as the parity disk.
    >
    > What is my next step?

    When you say things like "the system cannot mount /dev/boot on /stand",
    it's best to show us actual error messages so we have some idea _why_,
    thus how to fix the problem.

    But I can guess. /dev/boot is probably also referring to vdisk. I
    think vdisk leaves the original devices down under a modified name; if
    you `ls -l /dev/boot*`, do you see a node which is device 1,40?
    /dev/boot will probably be a minor of vdisk.

    Either `mount /dev/boot.orig /stand` (whatever the 1,40 device node is
    called); or make your own: `mknod /dev/boot.real b 1 40; mknod
    /dev/rboot.real c 1 40; mount /dev/boot.real /stand`.

    Do that just to verify that you can get access to the un-mirrored stand
    filesystem. Then you'll probably want to actually rearrange the device
    nodes; something like this:

      cd /dev
      mv boot boot.vdisk
      mv boot.real boot
      # also rearrange the raw division nodes, it will get _very_ confusing
      # later if these are out of sync with the block nodes
      mv rboot rboot.vdisk
      mv rboot.real rboot

    >From there, scoadmin might do better at rebuilding the mirrors. I'm not
    sure, I've rarely touched vdisk at all...

    >Bela<


  • Next message: John DuBois: "Re: only one program per login"