RE: dealing with a failing drive
- From: "Ted Mittelstaedt" <tedm@xxxxxxxxxxxxxxxx>
- Date: Sun, 25 Nov 2007 21:08:41 -0800
Are we looking at the same output?
Here's the output of idacontrol show off one of my DL360 servers:
mail# idacontrol show
cmd_show_all()
[Compaq Integrated Array controller]
Controller uptime: 301 hours 54 minutes 22 seconds
Firmware Version: 1.50 (running) 1.50 (ROM)
Revision -
Hardware: 2
Marketing: A
SCSI bus count: 2
Max drives per bus: 16
Maximum request: 65535 blocks
Logical drive 0: 17359MB (35553120 sectors), blocksize=512
Status: Logical drive ok
Mode: Mirroring (RAID1)
Drive ID: 00000000
Drive Label:
bus 1 target 0 lun 0:
enclosure 0, bay 0, connector 2J
<COMPAQ BB01813467 3BM0G606000071011MHF 3B07> direct-access
17361MB (35556888 512 byte sectors, 1088 reserved)
Sync, Ultra2, Wide - Configured in a logical volume.
bus 1 target 1 lun 0:
enclosure 0, bay 1, connector 2J
<COMPAQ BF01864663 3EV0J0V3000072363NRD 3B0B> direct-access
17361MB (35556888 512 byte sectors, 1088 reserved)
Sync, Ultra2, Wide - Configured in a logical volume.
bus 1 target 7 lun 0:
enclosure 0, bay 7, connector 2J
<COMPAQ PROLIANT 4L2I JB21> non-disk
Async
mail#
There are two physical disks in the server. bus 1 target 0 and
bus 1 target 1. Those ARE the physical disks. If one of them
has failed instead of:
Sync, Ultra2, Wide - Configured in a logical volume.
you will see something like:
Sync, Ultra2, Wide - Unconfigured
or nothing at all.
It is normal for idacontrol to generate soft write errors. The
developer knows about this. There's really no easy way to make
it not happen. It doesen't hurt anything, however.
If the RAID card itself is flakey you can't really tell it from
software. Even the Windows RAID utilities that HP/Compaq supplies
won't tell you this.
The "by the book" way of troubleshooting these servers is if you get
a disk failure, you immediately swap the disk. Then if the failure
happens again and your pretty sure it's not the disk, you down the
server, and boot it into Compaq Diagnostics and let it run for a day or so.
It is not uncommon to end up with several additional hard drives
that you don't need in the process of identifying a bad RAID card
in a server. We have all done it, it is part of the territory. If
you cannot afford it, stay away from these servers. Remember these
servers are designed for a medium to large corporation that has
a lot of resources.
To give you a typical scenario, a couple weeks ago one of our mailservers
running on a Proliant 1600R started freezing up. I had the admin
pull the entire disk array and put the disks into our backup server,
that went online in place of the original server, and the original
server was pulled and put on a test bench. About a week later the
admin finally discovered the processor board had worked it's way
almost out of the socket, after much hair-pulling, running of
diagnostics, and so on.
Ted
-----Original Message-----
From: owner-freebsd-questions@xxxxxxxxxxx
[mailto:owner-freebsd-questions@xxxxxxxxxxx]On Behalf Of David Newman
Sent: Sunday, November 25, 2007 2:58 PM
To: Ted Mittelstaedt
Cc: freebsd-questions@xxxxxxxxxxx
Subject: Re: dealing with a failing drive
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 11/24/07 12:39 PM, Ted Mittelstaedt wrote:
The output of idacontrol show will show if one of the
hard disks in the SmartArray has failed. Your choice with
a hardware array is to either run it with redundancy or not.
(ie: raid5 or mirroring or striping) You have to choose
which is more important for you.
IMHO it is very foolish to stripe an array that you have
critical data on and assume that you can predict a failure
of a disk using smart or other monitoring, and replace it
in advance of a failure. If your concern is redundancy, then
add more disks to the array and create a raid 5 or a mirror.
Then ignore all the predictive junk and let the array card
concern itself with detecting if a drive has failed. Run
idacontrol periodically out of a script that checks for a
failure of a disk and e-mails you if there is one.
Thanks, this is good advice, but it doesn't answer the specific
questions I had:
1. How to diagnose the health of a *physical* disk that's part of a RAID
array (RAID1, in this case) in an old Compaq Proliant server?
2. Is it normal for idacontrol to generate soft write errors?
Backstory here is that Proliant server #1 generated beaucoup hard and
soft read and write errors and eventually locked up. I thought it was
one of the disks but replacing one at a time didn't help. So I took both
disks and put them in identical Proliant server #2. Ergo, I would
conclude server #1's RAID controller flaked out.
idacontrol is useful for telling the health of the logical disk. What it
doesn't tell me (or maybe I just don't see it) is whether the physical
disks are ok, and those "soft write errors" concern me. I had a failure
situation, and need to figure out whether just the controller was bad or
whether I need to replace at least one disk too.
Thanks again!
dn
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)
iD8DBQFHSf39yPxGVjntI4IRAp1yAJ4vMV9FkeaBsHRr/Z5WpCL27wJ3tACfS+pT
3UVlscnQUZhe8ulHksKDWsY=
=Om7/
-----END PGP SIGNATURE-----
_______________________________________________
freebsd-questions@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to
"freebsd-questions-unsubscribe@xxxxxxxxxxx"
_______________________________________________
freebsd-questions@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- Re: dealing with a failing drive
- From: David Newman
- Re: dealing with a failing drive
- References:
- Re: dealing with a failing drive
- From: David Newman
- Re: dealing with a failing drive
- Prev by Date: Re: Confusion about Ports and options framework
- Next by Date: Re: [OT] who wrote this
- Previous by thread: Re: dealing with a failing drive
- Next by thread: Re: dealing with a failing drive
- Index(es):
Relevant Pages
|