Failed mirrored drives -- how to rebuild mirror with replacement drive?

From: Cohen, Andy (Andy.Cohen_at_cognex.com)
Date: 08/16/05

  • Next message: Derek Gatherer: "SUMMARY: FormMail.pl Error 500 after migration to virtual server"
    Date: Tue, 16 Aug 2005 15:17:24 -0400
    To: Tru64 Managers <tru64-unix-managers@ornl.gov>
    
    

    Hi,

    We have a DS20E running Tru64 5.1. There are two internal disks that
    are mirrored by LSM. Overnight one of them failed which brought the
    server down (see my previous email). We've rebooted off the one good
    remaining drive. In the meantime we've removed the bad drive and have
    received a replacement for it (DS-RZ2ED-LS). I'm told that this is
    hot-swappable so I can put it in with the sytem up and rebuild the
    mirror.

    My questions are:

    1) How do I make sure that once the new blank drive is installed the
    system doesn't think that's the good drive and resync the existing drive
    to this new, blank drive thereby wiping out the entire system disk? Is
    that even a possibility?

    2) How do I rebuild the mirror?

    Here's the volprint output:

    root@odin==> volprint
    Disk group: rootdg

    TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0
    PUTIL0
    dg rootdg rootdg - - - - -
    -

    dm dsk0f-AdvFS - - - - NODEVICE -
    -
    dm dsk0h - - - - NODEVICE -
    -
    dm dsk1d-AdvFS dsk1d - 4267761 - - -
    -
    dm dsk1f-AdvFS dsk1f - 12878154 - - -
    -
    dm dsk1h dsk1h - 4301507 - - -
    -
    dm root01 - - - - NODEVICE -
    -
    dm root02 dsk1a - 636421 - - -
    -
    dm swap01 - - - - NODEVICE -
    -
    dm swap02 dsk1b - 12354045 - - -
    -
    dm usr01 - - - - NODEVICE -
    -

    v rootvol root ENABLED 636421 - ACTIVE -
    -
    pl rootvol-01 rootvol DISABLED 636421 - NODEVICE -
    -
    sd root01-01p rootvol-01 DISABLED 16 0 NODEVICE -
    -
    sd root01-01 rootvol-01 DISABLED 636405 16 NODEVICE -
    -
    pl rootvol-02 rootvol ENABLED 636421 - ACTIVE -
    -
    sd root02-02p rootvol-02 ENABLED 16 0 - -
    -
    sd root02-02 rootvol-02 ENABLED 636405 16 - -
    -

    v swapvol swap ENABLED 12354045 - ACTIVE -
    -
    pl swapvol-02 swapvol ENABLED 12354045 - ACTIVE -
    -
    sd swap02-02 swapvol-02 ENABLED 12354045 0 - -
    -
    pl swapvol-01 swapvol DISABLED 12354045 - NODEVICE -
    -
    sd swap01-01 swapvol-01 DISABLED 12354045 0 NODEVICE -
    -

    v usrvol fsgen ENABLED 4267761 - ACTIVE -
    -
    pl usrvol-02 usrvol ENABLED 4267761 - ACTIVE -
    -
    sd dsk1d-01 usrvol-02 ENABLED 4267761 0 - -
    -
    pl usrvol-01 usrvol DISABLED 4267761 - NODEVICE -
    -
    sd usr01-01 usrvol-01 DISABLED 4267761 0 NODEVICE -
    -

    v vol-dsk0f fsgen ENABLED 12878154 - ACTIVE -
    -
    pl vol-dsk0f-02 vol-dsk0f ENABLED 12878154 - ACTIVE -
    -
    sd dsk1f-01 vol-dsk0f-02 ENABLED 12878154 0 - -
    -
    pl vol-dsk0f-01 vol-dsk0f DISABLED 12878154 - NODEVICE -
    -
    sd dsk0f-01 vol-dsk0f-01 DISABLED 12878154 0 NODEVICE -
    -

    To my untrained eye it looks like dsk0 was the failed drive.

    Would the following be the series of commands I would issue to rebuild?

    1. Dissassociate and remove all plexes associated with failed disk.

    #> volplex -o rm dis rootvol-01 swapvol-01 usrvol-01

    2. Remove the failed objects from LSM.

    #> voldg rmdisk root01 swap01 usr01 dskoh ## what about 'dsk0f-AdvFS' ?

    #> voldisk rm dsk0a dsk0b dsk0g dsk0d

    3. Replace disk and scan.

    #> hwmgr -scan scsi

    #> dsfmgr -e dskX dsk0 (where dskX=newly scanned disk)

    #> disklabel -rw dsk0

    4. Remirror the boot disk.

    #> volrootmir -a dsk0

    Anything here incorrect? Did I miss anything?

    Many, many thanks!
    Andy


  • Next message: Derek Gatherer: "SUMMARY: FormMail.pl Error 500 after migration to virtual server"

    Relevant Pages

    • SUMMARY: Advfs mount error on broken LSM mirror : E_CANT_FIND_LOG _END
      ... LSM was removed from the disk then it might be possible to run advfs salvage ... remove the stale mirror plexes from LSM: ... truncating log ...
      (Tru64-UNIX-Managers)
    • LSM private region size
      ... LSM issues worked out except for one. ... What happens when I need to mirror them to a new disk in 5.1a? ... Certified Tru64 v5 Systems Administrator ...
      (Tru64-UNIX-Managers)
    • Solaris RAID failover problems
      ... My intent is to mirror the drives completely, ... Can't read disk label. ... WARNING: md: d10: needs maintenance ... WARNING: forceload of misc/md_trans failed ...
      (comp.unix.solaris)
    • Re: Huge Av Disk Queue Length
      ... Write performance = 130 per mirror ... Mixed performance at a 1:1 R:W ratio = 195 IOPS per mirror. ... (sort of like weighted fair queuing for disk performance). ... put a 3020 in each location with FCP shelves for the primary data and SATA ...
      (microsoft.public.exchange.admin)
    • Re: how i locate swap area
      ... what you have with the command 'vgdisplay -v vg00'. ... Why Mirror Copies is 0? ... it seems that you have no disk mirroring. ... Primary Partition Table: ...
      (comp.sys.hp.hpux)