Re: A little story of failed raid5 (3ware 8000 series)



On 8/25/07, Tom Samplonius <tom@xxxxxxxxxxxxxx> wrote:


----- "Artem Kuchin" <matrix@xxxxxxxxxxx> wrote:
...
But i don't understand how and why it happened. ONly 6 hours ago (a
night before)
all those files were backed up fine w/o any read error. And now, right
after replacing
the driver and starting rebuild it said that there are bad sectors all
over those file.
How come?

What happened to you was an extremely common occurrence. You had a disk
develop a media failure sometime ago, but the controller never detected it,
because that particular bad area was not read. Your backups worked because
they never touched this portion of the disk (ex. empty space, meta data,
etc). And then another drive developed a electronics failure, which is
instantly detected, putting the array into a degraded mode. When you did a
rebuild onto a replace drive, the controller discovered that there was a
second failed disk, and this is unrecoverable.

RAID, of any level, isn't magic. It is important to understand how it
works, an realize that drives can passive fail. BTW, if you were using
RAID1 or RAID10, you would likely have had the same problem (well, RAID10
can survive _some_ double-disk failures). RAID6 is the only RAID level that
can survive failure of any two disks.

The real solution is RAID scrubbing: a low level background process that
reads every sector of every disk. All of the real RAID systems do this
(usually scheduled weekly, or every other week). Most 3ware RAID card don't
have this feature.

So rather than not using RAID5 or RAID6 again, you should just not use
3ware anymore.


Tom


It's a common mistake to assume that regular maintenance are not required
which is implied through "It was working for years".
Your just waiting for some event to happen which could result in
catastrophic failure like this. It's always good to act, rather than react
especially
dealing with critical data.

-Manjunath
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Cost of setting up a network
    ... so the tape in the drawer prevents data that hasn't yet been backed up from being lost by disk failure better than a simple RAID 1 setup can? ...
    (uk.comp.homebuilt)
  • Re: P5LD2-Deluxe and RAID 1 rebuild...how to stop
    ... >> BIOS) from prompting them to rebuild a RAID 1 aarray t bootup. ... > When the Intel software wants to rebuild the RAID 1 array, ... > disk that is ready is marked as being an "orphan". ...
    (alt.comp.periphs.mainboard.asus)
  • Re: Paul and Old Man: Cannot fix RAID5 failure ...
    ... all of the sudden the Intel Matrix Storage ROM showed Rebuild status. ... When booting form RAID, there is some activity while the screen is black, ... Considering 3 SATA drives spining up after powerdown may cause a 12V ... I was considering a parallel WinXP installation on a 4th disk, ...
    (alt.comp.periphs.mainboard.asus)
  • Re: RAID level for personal archival?
    ... I'm building a RAID solution for personal archiving (general large ... Much of this is irreplacable data - RAID ... failure would be very bad. ... the event of a disk failure, could be turned off until the disk is ...
    (comp.arch.storage)
  • RE: ICH5R, ICH7R or Promise PDC20378 RAID?
    ... either with "atacontrol rebuild ar0" or from the bios. ... spare disk is necessary for a complete rebuild. ... use a spare drive or re-format the drive and then replug it. ... onboard raid, get a RAID card even the cheap ones will do better job!! ...
    (freebsd-questions)