A little story of failed raid5 (3ware 8000 series)



Hello!

Here is the newest story of mine about how one should
never use raid5.

Controller is 8xxx-4LP.
I have a simple 360GB raid5 with 4 drives since 2004.
Only about a year ago i realized how much speed i have
wasted be saving lousy 120GB. I should have choosen
bigger driver and setup two mirrors instead.

But that's no the point. A week ago one driver just
totally failed. It fell out of the unit and when i tried
to rebuild the unit it failed. It seemed like the driver
electronis failed. ANyhow, i have found newest 160gb seagate
driver for replacement (twice as thin, very nicely done
electornics on it).

A day ago at 11 am i have turn off the server,
pull out the old driver, installed a new one, turned of the server
and started rebuild in an hour from remote location via web interface.
After about 5 minuted the machine became unresponsive. Tried rebooting
- nothing. I went to the machine and fingure out, that rebuild failed (0%)
and some data cannot be read because of bad sectors.

Well, hell, i thoght. Maybe i could tell teh controller to ignore all the
errors and just some rebuilding and the figure out which driver failed,
replace it, rebuild again and restore corrupted data from backup.
Noway, controller said.

- i cannot make it ignore read errors
- i cannot figure out which driver has bad sectors
(maybe someone know it?)

But i don't understand how and why it happened. ONly 6 hours ago (a night before)
all those files were backed up fine w/o any read error. And now, right after replacing
the driver and starting rebuild it said that there are bad sectors all over those file.
How come?

Well. Since we have a buch of full and inceremnetal paraoid backups no data was lost and
we are in process of recovering. However, i simply imaged what would happed if one more
driver completelly failed. That would mean that we have lost all data, since any of the disk
which left do not contain any readable copy of one data (unlink mirror, for example).

So, we are migrating to mirror config with huge disks.

I am thinking about raid10 for more perfomance. It seems a lot more safe, since if any pair of disks failed the data is still readable and even if all disks have bad block the data can be easily recovered by fairly simply script from the couterpart. But the problem, however,

So, no raid5 or even raid 6 for me any more. Never!


--
Regards,
Artem

_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Problem with raid and rh8
    ... I think the problem is due to the fact that the native rh8 driver for the ... simple test and this driver correctly recognises the 2 disks as one mirror. ... Does anyone know how I can remove the native controller driver from the ...
    (linux.redhat)
  • Re: OSR507 on Dell 1850 with Perc 4E
    ... te of the bootstring required to load the driver. ... > successfully probed the controller then you can start worrying about ... New Installation ... like striping either (1/2 the reliability of plain disks, ...
    (comp.unix.sco.misc)
  • Re: HELP Network problem
    ... I rebooted and the system came back up but didnt find a network connection. ... As expected as I had to rebuild the driver for my card. ... However rebuilding the driver and inserting it didnt work. ...
    (Debian-User)
  • Re: Installing SCSI controller Drivers
    ... Is it also possible to reload a driver within ... I upgraded the controller ... >You'll want to boot the Windows 2000 setup disks or CD- ... >install disks can be created from your Windows 2000 CD- ...
    (microsoft.public.windows.server.setup)
  • Re: sata_vsc, sata_core problem.... Please help me. (another clue)
    ... >> immediately recognizes the controller card and when it then enumerates the ... >> attached disks, I am getting errors logged in syslog for each disk as ... >> Is there a way to disable or ignore these interrupts until the driver is ...
    (Linux-Kernel)