[HPADM] [SUMMARY] Can a mirrored boot disk be hot-replaced

From: Garner, Jim - DIT (garnerjr@ci.richmond.va.us)
Date: 04/16/03

  • Next message: Ben Le: "[HPADM] Oracle vs COBOL"
    From: "Garner, Jim  -  DIT" <garnerjr@ci.richmond.va.us>
    To: "'hpux-admin@dutchworks.nl'" <hpux-admin@dutchworks.nl>
    Date: Wed, 16 Apr 2003 16:01:36 -0400
    
    

    Here's the original post:

    > We had a failed, internal disk drive in a rp5470 (L-class). It
    > was a mirror of the boot disk. The system kept running. An HP
    > engineer was dispatched with a replacement disk. He said it would
    > be necessary to shutdown the system and bring it up in single user
    > mode to vgsync the logical volumes. The vgsync took about 45
    > minutes. Adding in the time to shutdown and boot, the system was
    > down for an hour. Management wants to know why it was necessary
    > to take the system down. I called HP and was told, "Hot swap and
    > hot replace are not the same thing. You risk damaging the system
    > bus, crashing the operating system, or corrupting your data."
    >
    > I would like to receive some opinions on this. I will summarize.
    >
    > Extra info:
    > In a document entitled "LVM: Procedure for replacing an LVM disk
    > in HP-UX 10.x and 11.x" (Document ID KBAN00000347), HP describes a
    > procedure for replacing a mirrored root volume in which a shutdown
    > is done. But there is this note:
    >
    > "Note: If the disk being replaced is Hot-Pluggable (or Hot-
    > Swappable) a reboot may not be necessary. Please inquire your
    > customer engineer to determine if a reboot is required."

    I really appreciate all the replies I received. I wish I could
    report that there was a consensus, but there was not. The tally
    from the list was: 3 agree with the "reboot, single-user vgsync"
    approach, 6 agree on the feasability of online rebuild, and 6 were
    on the fence.

    I e-mailed HP and asked for a clear statement of why the vgsync
    could not be done in multi-user mode. Here is what I recieved:

    > Well after touching base with [the HPCE],
    > the SIT-UNIX software team I have arrived at the following
    > explanation. And, I believe this is the same explanation
    > offered to Jim.
    > Since this was your ROOT disk the information had to be re-synced
    > from the other drive. Therefore to prevent a possibility of
    > data corruption this operation had to be done in single user
    > mode. No one given the ability to logon while the rebuild was
    > in place. If users had been allowed to logon while the re-sync
    > (rebuild) was taking place the resync would have taken infinitely
    > longer and data corruption of your root disk greatly increased.

    I would prefer some deeper insight into this situation, but I don't
    think I'm going to get it. Things I wonder about:
    An lvdisplay -v of any LV on the disk showed that some extents on
    the failed disk were stale. I assume they represent disk blocks
    which had been updated on the good side of the mirror. I bet if I
    tried to split the mirror, the command would hang. I guess the
    question is, if I just replace the disk, will the LVM subsystem
    notice the disk is foreign before I can do a vgcfgrestore? If the
    answer is no, the system may read blocks that are still marked as
    current, and as a result some bad data might get written onto the
    good side of the mirror. If the answer is yes, then LVM should be
    smart enough to not use the disk until it is synced. I know that
    under normal circumstances I can lvsplit a mirror and later lvmerge
    it, and it will sync without corruption while the system is in use.

    Anyway, thanks again for the interest, and if I hear anymore that is
    worth sharing, I'll post a supplemental summary.

    Jim Garner
    Systems Engineer
    City of Richmond, Virginia

    Following are the replies I received.

    ====================================================================

    Paveza, Gary [gary.paveza@AIG.COM] wrote:
    I believe your problem was that it was an internal disk. They are
    not designed as hot-swappable. There are units which are
    (jamacia's) which allow for hot-swapping.

    ====================================================================

    LAVERY,MIKE (HP-UnitedKingdom,ex1) [mike.lavery@hp.com] wrote:
    you need to make sure your components/disks are hot-swappable as
    these can be replaced while the system is running.

    Hot-pluggable is not enough if you want to replace a component
    online. More than likely you will need to shutdown the system.

    ====================================================================

    Abramson, Stuart [SAbramson@Wabtec.com] wrote:
    If you had hot-pluggable disks and the failed disk was mirrored,
    then you didn't have to shut down.

    Here is what you do:

       a. Replace physical disk:

             Call HP Response Center. Request replacement disk.
             CE replaces disk. These disks are "hot-plugable".

       b. The two boot disks in our scenario are:

             cLt6d0
             cRt6d0

       c. Rebuild the disk from vgcfgbackup

             pvcreate -B /dev/rdsk/cNtt5d0 # N is [RL] no.
             mkboot -l /dev/rdsk/cNt6d0
             mkboot -a "hpux -lq (;0)/stand/vmunix" /dev/rdsk/cNt6d0
             vgcfgrestore -n vg00 /dev/rdsk/cNt6d0
             vgsync vg00

    Now I'm kind of surprised that a CE, who should know what he is
    doing, didn't know this. There may be more to your story:

    Were each and every logical volume on the failed disk mirrored
    properly?

    ====================================================================

    Thomas V. Myers [tvmyers@ic.delcoelect.com] wrote:
    The HP FSE was completely and utterly wrong. The four internal disk
    drive slots on the rp54xx family are on two SCSI channels.
    Normally, you mirror across the channels. The drives are in fact,
    hot replaceable. You also don't have to perform the resync in
    single-user mode.

    ====================================================================

    bill.thompson@goodyear.com wrote:
    This is what I was told by a reputable HP Engineer: The definition
    of Hot Swap has changed from time to time but you should be able to
    change the internal drive on an rp5470 without a reboot. It is
    preferred that you lvreduce the logical volumes to remove the mirror
    before hand and the re-establish the mirroring after the drive has
    been replaced, but even that is not required.

    I was told the statement "You risk damaging the system bus, crashing
    the operating system, or corrupting your data." is correct, but you
    risk being hit by lightening every time you step outside (and your
    chances of getting hit by lightening are probably greater).

    HP does rely on the field engineer to make the final decision on
    this. Perhaps there was some particular reason that the field
    engineer decided to shut the system down in this case.

    ====================================================================

    aynal hossain [aynal_hossain@hotmail.com] wrote:
    Boot disk hot replaceable if it is Hot Plug -in facility or system
    has to bring down in Single users mode and do the vgsync and bring
    back up the system, as per my opinion.

    ====================================================================

    Thornberry, Scott (S.) [sthornbe@ford.com] wrote:
    I get that response a lot, there is a diff between hot swap and hot
    plug, however there is a lot of confusing over what hardware is
    exactly that. We had a HP Tech out doing some work at our place,
    and in talking to him, he says it is indeed confusing, but you need
    to know the firmware ver as well, to make it a clear point, if
    indeed the hardware is a hot plug or hot swap.

    I think HPs point a lot of times, is when dealign with root disk,
    is to have it in single user as to prevent anyting else that may
    occure during your resync, but I have done a vgsync as a system
    was up, but then it depends on your sitution and enviroment. A hot
    plug I beleive is something you can replace with out a power down,
    where as a hot swap is you do it on the fly, but I have been told
    our dlt drives were all hot swaps, only to have a system crash with
    out doing a boot of the system.

    ====================================================================

    Thomas Leber - PA [Thomas_Leber@GMACM.COM] wrote:
    I've never done it with internal drives on L's in particular, but
    lots of times with externals (Jamaicas, SC10s, etc). I'd think as
    long as the drive is the only device on that SCSI bus (or if the
    others are idle), you should be fine.

    In my experience, a lot depends on the particular CE you deal with -
    some have no problem with it; others insist on shutdown

    ====================================================================

    Mike.Keighley@lexicon.co.uk wrote:
    I had a similar conversation with an HP engineer on eactly the same
    subject. He said that I was free to hot-swap it at my risk, but
    they do not recommend that.

    Thinking about it since, perhaps he has a point.
    A pair of mirrored boot disks are both actively writing (including
    the swap file) all the time.
    Even if the disks are on separate buses (which I think they are on
    the L-class), pulling a disk during operation may abort a write,
    and would certainly cause a bus reset. You might hope that LVM
    would cope with that, but can you guarantee it ?

    If the disk is in an array which has a hot-spare facility then that
    is different. When failure is detected you would expect the array
    to spin down the faulty disk, spin up the hot spare, and start a
    rebuild. In this case the faulty disk is guaranteed to be idle, and
    presumably the array is designed to withstand disks being pulled.

    So if you are booting off your EMC, VA7400, FC60 or whatever then no
    shutdown, but booting off the internal disks, bit dodgy.

    As far as the vgcfgrestore & vgsync was concerned, yes we had to do
    that, but we did it with the system up in level 3 and working. I
    can't see why you would need to be in single user mode all that
    time. The engineer did comment that there had been bugs in the past
    which made this risky, but he thought they were all fixed. At your
    own risk again, which being fully patched, I did.

    ====================================================================

    Jeff Cleverley [jeffc@ftc.agilent.com] wrote:
    An interesting question. I'm setting up 4 new 5470s now and went to
    the manual. It's not very clear. Below are some items on page 195
    of the system information manual. I believe I got this doc off of
    the web.

    >
    HotPlug disk drive replacement

    The internal disk drives (up to four) are located at the front right
    side of the server (as you are facing it). When proper software and
    hardware procedures are followed, internal dis drives can be removed
    and replaced while the server is running.

    Just below this is a caution box:

    Disk drives can be removed or installed with the server still
    powered on. This is refered to as a "manual HotPlug".

    However, DO NOT replace a HotPlug disk drive until a controlled
    shutdown of the operating system has been performed.
    <

    This seems very contradictary. What's the point of having
    hotplug/hotswap disks if all it buys you is the power can be on when
    you replace a disk.

    ====================================================================

    Julius Szelagiewicz [julius@turtle.com] wrote:
    when faced with your quandry, I made another ignite tape and
    swapped the disk. It says "hot swappable", not "hot removable and
    cold insertable". No problems, no downtime, rebuild took about 1
    hour. The lucky part was that it was strictly system disk, so the
    real changes are minimal and ignite backup was perfectly adequate

    ====================================================================

    Christopher H Vann [vannc@dteenergy.com] wrote:
    If the disk is hot swappable:
    We pull the bad one, insert the new one, vgcfgrestore it,
    vgchange -a y the VG and vgsync it.

    If it's not hot swappable:
    We remove the disk from the VG, shutdown, replace the disk, boot up
    off the one good disk and re-mirror it. - OR - shutdown, replace
    bad disk, boot up without quota on good disk, then run like above
    (vgcfgrestore,...)

    ====================================================================

    Marvin Blackburn [mblackburn@glenraven.com] wrote:
    At one time, hp had stated that there were certain circumstances
    when it was advisable to shut the system down to replace the disk.
    That was even in an alert they sent out and posted. However,
    when we followed up on this, they stated that it was no longer the
    case and that we could replace them hot.

    We have done this several times on L's, K's, and D's without
    incident. Our ce even does this.

    ====================================================================

    Illgen Steve 448 [steve.illgen@crackerbarrel.com] wrote:
    I believe you can replace a "hot-swappable" mirrored boot disk on an
    L2000 as long as you first remove the disk from the Volume Group.
    Once the new disk is in place, add it back into the VG and
    reestablish the mirror.

    I had to do this this on an N-Class and did not experience any
    problems.

    ====================================================================

    Scalone, Galen [Galen.Scalone@vacationclub.com] wrote:
    You should be able to hot swap the old disk out, add the new disk,
    do your mkboots,and use vgcfgrestore to restore vg config to that
    disk, then vgsync. No downtime.

    ====================================================================

    "Beerse, Corné" [c.beerse@torex-hiscom.nl] wrote:
    I'm just an other sysadmin that is doing some philosophy. I'm
    thinking about the parts that will work and the parts that might
    fail...

    If there are no other hickups, the hot replace should work as
    expected. With hot-replacable and hot swappable and otherwise
    capable hardware, filesystem and software, there should be no
    problem at all. Then, from Compaq (now also HP) documentation,
    I recal the initialisation of the new disk to become a new mirror
    can take up to 15 minutes per MegaByte. Hence, with a 72 GB disk,
    that can take 18 hours. During this time, the entire system is not
    protected by the mirror, the I/O is relatively (e.g. verry) slow and
    other discomfort.

    Then, during the time the mirror is not fully recovered, what
    happens if... Is the new mirror direct bootable or only after a
    finished recovery. Is the old disk from the same batch as the
    crashed disk (might it also fail relatively fast)?

    ====================================================================

    Ben Le [ble@pcc.edu] wrote:
    I had this problem before. Yes, HP recommends to shutdown system
    before replacing the disk to play it safe. After the disk replaced,
    system reboot will automatic re-sync the mirror itself if you
    running disk mirror on your system. As I understand, vgsync can be
    done during system is up.

    ====================================================================

    Matthew.Gibson@Microchip.com wrote:
    I have an RP5405 ( L3000 ) and had the same question. I found the
    following on the ITRC website on Page 47 of the PDF file attached.

    Hot-Plug Disk Drives
    The L-Class has four embedded SCSI disks accessible from the front
    of the server. These disks can be removed and inserted while the
    L-Class continues to operate. This operation is called "hot-plug,"
    and it is different from "hot-swap."

    During both hot-plug and hot-swap operations, the power remains on
    and the system continues to function. However, hot-swap means that
    the assembly can be removed, added, or replaced without informing
    the system. Hot-plug requires the assembly to be de-configured
    before removal and reconfigured before the system can utilize the
    newly inserted assembly. Because disks have unique information
    stored on them, hot-plug methods are used. Fans and power supplies
    in the L-Class are hotswap assemblies.

    ====================================================================

    --
                 ---> Please post QUESTIONS and SUMMARIES only!! <---
            To subscribe/unsubscribe to this list, contact majordomo@dutchworks.nl
           Name: hpux-admin@dutchworks.nl     Owner: owner-hpux-admin@dutchworks.nl
     
     Archives:  ftp.dutchworks.nl:/pub/digests/hpux-admin       (FTP, browse only)
                http://www.dutchworks.nl/htbin/hpsysadmin   (Web, browse & search)
    

  • Next message: Ben Le: "[HPADM] Oracle vs COBOL"

    Relevant Pages

    • Re: Older StorageWorks Parts Not Available
      ... > Who did the parts replacement - CS or the Customer? ... you do also not simply swap bad drives with good drives ... The parts were DOA. ... Disk drives are designed to survive certain ...
      (comp.os.vms)
    • Re: Realistic drive arrays
      ... > may be an opportunity to swap 2 of the disks for a pair of 143Gb drives ... >> two drives in the same mirror failing within the parts replacement SLA of ... >> How much memory the store.exe process uses really depends on how much ... >>> copy of the database on the disk. ...
      (microsoft.public.exchange.setup)
    • Re: Using mirroring to replace drive?
      ... I upgrade and rebuild it. ... drive which happens to be the system disk. ... and mirror the failing drive on the new drive. ... upgrade to 7.0 after establishing the new good drives). ...
      (freebsd-questions)
    • Re: Reinstall or restore with different disk config
      ... I use a Dell with the RAID 1 on th eonboard and it seems quite healthy but I have never done the pull the disk out test yet. ... One thing to do maybe is to disable write caching on the drives (see in properties of the drive itself in Windows) you may or may not have this option depending on driver. ... The boot.ini change relates to the fat that the path specifies a drive and partition to boot from - if the drive specified is off-line then the boot.ini path is invalid so you need to add a redundant entry pointing to the second mirror drive to start from. ...
      (microsoft.public.windows.server.sbs)
    • RE: Windows 2003 Mirror and Boot Partition
      ... For your complex hardware disk situation, ... How to mirror the system and boot partition in Windows ... | on the two new drives. ...
      (microsoft.public.windows.server.general)