[HPADM] [SUMMARY] Can a mirrored boot disk be hot-replaced
From: Garner, Jim - DIT (garnerjr@ci.richmond.va.us)
Date: 04/16/03
- Previous message: Paul.Soltermann@vonroll-isola.com: "[HPADM] Summery : Howto remove control characters from a unix text fil e"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
From: "Garner, Jim - DIT" <garnerjr@ci.richmond.va.us> To: "'hpux-admin@dutchworks.nl'" <hpux-admin@dutchworks.nl> Date: Wed, 16 Apr 2003 16:01:36 -0400
Here's the original post:
> We had a failed, internal disk drive in a rp5470 (L-class). It
> was a mirror of the boot disk. The system kept running. An HP
> engineer was dispatched with a replacement disk. He said it would
> be necessary to shutdown the system and bring it up in single user
> mode to vgsync the logical volumes. The vgsync took about 45
> minutes. Adding in the time to shutdown and boot, the system was
> down for an hour. Management wants to know why it was necessary
> to take the system down. I called HP and was told, "Hot swap and
> hot replace are not the same thing. You risk damaging the system
> bus, crashing the operating system, or corrupting your data."
>
> I would like to receive some opinions on this. I will summarize.
>
> Extra info:
> In a document entitled "LVM: Procedure for replacing an LVM disk
> in HP-UX 10.x and 11.x" (Document ID KBAN00000347), HP describes a
> procedure for replacing a mirrored root volume in which a shutdown
> is done. But there is this note:
>
> "Note: If the disk being replaced is Hot-Pluggable (or Hot-
> Swappable) a reboot may not be necessary. Please inquire your
> customer engineer to determine if a reboot is required."
I really appreciate all the replies I received. I wish I could
report that there was a consensus, but there was not. The tally
from the list was: 3 agree with the "reboot, single-user vgsync"
approach, 6 agree on the feasability of online rebuild, and 6 were
on the fence.
I e-mailed HP and asked for a clear statement of why the vgsync
could not be done in multi-user mode. Here is what I recieved:
> Well after touching base with [the HPCE],
> the SIT-UNIX software team I have arrived at the following
> explanation. And, I believe this is the same explanation
> offered to Jim.
> Since this was your ROOT disk the information had to be re-synced
> from the other drive. Therefore to prevent a possibility of
> data corruption this operation had to be done in single user
> mode. No one given the ability to logon while the rebuild was
> in place. If users had been allowed to logon while the re-sync
> (rebuild) was taking place the resync would have taken infinitely
> longer and data corruption of your root disk greatly increased.
I would prefer some deeper insight into this situation, but I don't
think I'm going to get it. Things I wonder about:
An lvdisplay -v of any LV on the disk showed that some extents on
the failed disk were stale. I assume they represent disk blocks
which had been updated on the good side of the mirror. I bet if I
tried to split the mirror, the command would hang. I guess the
question is, if I just replace the disk, will the LVM subsystem
notice the disk is foreign before I can do a vgcfgrestore? If the
answer is no, the system may read blocks that are still marked as
current, and as a result some bad data might get written onto the
good side of the mirror. If the answer is yes, then LVM should be
smart enough to not use the disk until it is synced. I know that
under normal circumstances I can lvsplit a mirror and later lvmerge
it, and it will sync without corruption while the system is in use.
Anyway, thanks again for the interest, and if I hear anymore that is
worth sharing, I'll post a supplemental summary.
Jim Garner
Systems Engineer
City of Richmond, Virginia
Following are the replies I received.
====================================================================
Paveza, Gary [gary.paveza@AIG.COM] wrote:
I believe your problem was that it was an internal disk. They are
not designed as hot-swappable. There are units which are
(jamacia's) which allow for hot-swapping.
====================================================================
LAVERY,MIKE (HP-UnitedKingdom,ex1) [mike.lavery@hp.com] wrote:
you need to make sure your components/disks are hot-swappable as
these can be replaced while the system is running.
Hot-pluggable is not enough if you want to replace a component
online. More than likely you will need to shutdown the system.
====================================================================
Abramson, Stuart [SAbramson@Wabtec.com] wrote:
If you had hot-pluggable disks and the failed disk was mirrored,
then you didn't have to shut down.
Here is what you do:
a. Replace physical disk:
Call HP Response Center. Request replacement disk.
CE replaces disk. These disks are "hot-plugable".
b. The two boot disks in our scenario are:
cLt6d0
cRt6d0
c. Rebuild the disk from vgcfgbackup
pvcreate -B /dev/rdsk/cNtt5d0 # N is [RL] no.
mkboot -l /dev/rdsk/cNt6d0
mkboot -a "hpux -lq (;0)/stand/vmunix" /dev/rdsk/cNt6d0
vgcfgrestore -n vg00 /dev/rdsk/cNt6d0
vgsync vg00
Now I'm kind of surprised that a CE, who should know what he is
doing, didn't know this. There may be more to your story:
Were each and every logical volume on the failed disk mirrored
properly?
====================================================================
Thomas V. Myers [tvmyers@ic.delcoelect.com] wrote:
The HP FSE was completely and utterly wrong. The four internal disk
drive slots on the rp54xx family are on two SCSI channels.
Normally, you mirror across the channels. The drives are in fact,
hot replaceable. You also don't have to perform the resync in
single-user mode.
====================================================================
bill.thompson@goodyear.com wrote:
This is what I was told by a reputable HP Engineer: The definition
of Hot Swap has changed from time to time but you should be able to
change the internal drive on an rp5470 without a reboot. It is
preferred that you lvreduce the logical volumes to remove the mirror
before hand and the re-establish the mirroring after the drive has
been replaced, but even that is not required.
I was told the statement "You risk damaging the system bus, crashing
the operating system, or corrupting your data." is correct, but you
risk being hit by lightening every time you step outside (and your
chances of getting hit by lightening are probably greater).
HP does rely on the field engineer to make the final decision on
this. Perhaps there was some particular reason that the field
engineer decided to shut the system down in this case.
====================================================================
aynal hossain [aynal_hossain@hotmail.com] wrote:
Boot disk hot replaceable if it is Hot Plug -in facility or system
has to bring down in Single users mode and do the vgsync and bring
back up the system, as per my opinion.
====================================================================
Thornberry, Scott (S.) [sthornbe@ford.com] wrote:
I get that response a lot, there is a diff between hot swap and hot
plug, however there is a lot of confusing over what hardware is
exactly that. We had a HP Tech out doing some work at our place,
and in talking to him, he says it is indeed confusing, but you need
to know the firmware ver as well, to make it a clear point, if
indeed the hardware is a hot plug or hot swap.
I think HPs point a lot of times, is when dealign with root disk,
is to have it in single user as to prevent anyting else that may
occure during your resync, but I have done a vgsync as a system
was up, but then it depends on your sitution and enviroment. A hot
plug I beleive is something you can replace with out a power down,
where as a hot swap is you do it on the fly, but I have been told
our dlt drives were all hot swaps, only to have a system crash with
out doing a boot of the system.
====================================================================
Thomas Leber - PA [Thomas_Leber@GMACM.COM] wrote:
I've never done it with internal drives on L's in particular, but
lots of times with externals (Jamaicas, SC10s, etc). I'd think as
long as the drive is the only device on that SCSI bus (or if the
others are idle), you should be fine.
In my experience, a lot depends on the particular CE you deal with -
some have no problem with it; others insist on shutdown
====================================================================
Mike.Keighley@lexicon.co.uk wrote:
I had a similar conversation with an HP engineer on eactly the same
subject. He said that I was free to hot-swap it at my risk, but
they do not recommend that.
Thinking about it since, perhaps he has a point.
A pair of mirrored boot disks are both actively writing (including
the swap file) all the time.
Even if the disks are on separate buses (which I think they are on
the L-class), pulling a disk during operation may abort a write,
and would certainly cause a bus reset. You might hope that LVM
would cope with that, but can you guarantee it ?
If the disk is in an array which has a hot-spare facility then that
is different. When failure is detected you would expect the array
to spin down the faulty disk, spin up the hot spare, and start a
rebuild. In this case the faulty disk is guaranteed to be idle, and
presumably the array is designed to withstand disks being pulled.
So if you are booting off your EMC, VA7400, FC60 or whatever then no
shutdown, but booting off the internal disks, bit dodgy.
As far as the vgcfgrestore & vgsync was concerned, yes we had to do
that, but we did it with the system up in level 3 and working. I
can't see why you would need to be in single user mode all that
time. The engineer did comment that there had been bugs in the past
which made this risky, but he thought they were all fixed. At your
own risk again, which being fully patched, I did.
====================================================================
Jeff Cleverley [jeffc@ftc.agilent.com] wrote:
An interesting question. I'm setting up 4 new 5470s now and went to
the manual. It's not very clear. Below are some items on page 195
of the system information manual. I believe I got this doc off of
the web.
>
HotPlug disk drive replacement
The internal disk drives (up to four) are located at the front right
side of the server (as you are facing it). When proper software and
hardware procedures are followed, internal dis drives can be removed
and replaced while the server is running.
Just below this is a caution box:
Disk drives can be removed or installed with the server still
powered on. This is refered to as a "manual HotPlug".
However, DO NOT replace a HotPlug disk drive until a controlled
shutdown of the operating system has been performed.
<
This seems very contradictary. What's the point of having
hotplug/hotswap disks if all it buys you is the power can be on when
you replace a disk.
====================================================================
Julius Szelagiewicz [julius@turtle.com] wrote:
when faced with your quandry, I made another ignite tape and
swapped the disk. It says "hot swappable", not "hot removable and
cold insertable". No problems, no downtime, rebuild took about 1
hour. The lucky part was that it was strictly system disk, so the
real changes are minimal and ignite backup was perfectly adequate
====================================================================
Christopher H Vann [vannc@dteenergy.com] wrote:
If the disk is hot swappable:
We pull the bad one, insert the new one, vgcfgrestore it,
vgchange -a y the VG and vgsync it.
If it's not hot swappable:
We remove the disk from the VG, shutdown, replace the disk, boot up
off the one good disk and re-mirror it. - OR - shutdown, replace
bad disk, boot up without quota on good disk, then run like above
(vgcfgrestore,...)
====================================================================
Marvin Blackburn [mblackburn@glenraven.com] wrote:
At one time, hp had stated that there were certain circumstances
when it was advisable to shut the system down to replace the disk.
That was even in an alert they sent out and posted. However,
when we followed up on this, they stated that it was no longer the
case and that we could replace them hot.
We have done this several times on L's, K's, and D's without
incident. Our ce even does this.
====================================================================
Illgen Steve 448 [steve.illgen@crackerbarrel.com] wrote:
I believe you can replace a "hot-swappable" mirrored boot disk on an
L2000 as long as you first remove the disk from the Volume Group.
Once the new disk is in place, add it back into the VG and
reestablish the mirror.
I had to do this this on an N-Class and did not experience any
problems.
====================================================================
Scalone, Galen [Galen.Scalone@vacationclub.com] wrote:
You should be able to hot swap the old disk out, add the new disk,
do your mkboots,and use vgcfgrestore to restore vg config to that
disk, then vgsync. No downtime.
====================================================================
"Beerse, Corné" [c.beerse@torex-hiscom.nl] wrote:
I'm just an other sysadmin that is doing some philosophy. I'm
thinking about the parts that will work and the parts that might
fail...
If there are no other hickups, the hot replace should work as
expected. With hot-replacable and hot swappable and otherwise
capable hardware, filesystem and software, there should be no
problem at all. Then, from Compaq (now also HP) documentation,
I recal the initialisation of the new disk to become a new mirror
can take up to 15 minutes per MegaByte. Hence, with a 72 GB disk,
that can take 18 hours. During this time, the entire system is not
protected by the mirror, the I/O is relatively (e.g. verry) slow and
other discomfort.
Then, during the time the mirror is not fully recovered, what
happens if... Is the new mirror direct bootable or only after a
finished recovery. Is the old disk from the same batch as the
crashed disk (might it also fail relatively fast)?
====================================================================
Ben Le [ble@pcc.edu] wrote:
I had this problem before. Yes, HP recommends to shutdown system
before replacing the disk to play it safe. After the disk replaced,
system reboot will automatic re-sync the mirror itself if you
running disk mirror on your system. As I understand, vgsync can be
done during system is up.
====================================================================
Matthew.Gibson@Microchip.com wrote:
I have an RP5405 ( L3000 ) and had the same question. I found the
following on the ITRC website on Page 47 of the PDF file attached.
Hot-Plug Disk Drives
The L-Class has four embedded SCSI disks accessible from the front
of the server. These disks can be removed and inserted while the
L-Class continues to operate. This operation is called "hot-plug,"
and it is different from "hot-swap."
During both hot-plug and hot-swap operations, the power remains on
and the system continues to function. However, hot-swap means that
the assembly can be removed, added, or replaced without informing
the system. Hot-plug requires the assembly to be de-configured
before removal and reconfigured before the system can utilize the
newly inserted assembly. Because disks have unique information
stored on them, hot-plug methods are used. Fans and power supplies
in the L-Class are hotswap assemblies.
====================================================================
--
---> Please post QUESTIONS and SUMMARIES only!! <---
To subscribe/unsubscribe to this list, contact majordomo@dutchworks.nl
Name: hpux-admin@dutchworks.nl Owner: owner-hpux-admin@dutchworks.nl
Archives: ftp.dutchworks.nl:/pub/digests/hpux-admin (FTP, browse only)
http://www.dutchworks.nl/htbin/hpsysadmin (Web, browse & search)
- Previous message: Paul.Soltermann@vonroll-isola.com: "[HPADM] Summery : Howto remove control characters from a unix text fil e"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|