Re: hard disk failure - now what?



Kelly Martin <kellymartin@xxxxxxxxx> writes:

I just experienced a hard drive failure on one of my FreeBSD 7.2
production servers with no backup! I am so mad at myself for not
backing up!! Now it's a salvage operation. Here are the type of errors
I was getting on the console, over-and-over:

ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=441633503
ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout -
completing request directly
ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout -
completing request directly
ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad4: FAILURE - WRITE_DMA48 timed out LBA=441633375
g_vgs_done():ad4s1f[WRITE(offset=216338284544, length=16384)]error = 5

I could still login to the machine (after an eternity) but got lots of
read/write errors along the way. The offset shown in the errors kept
changing, so I thought it was a hardware eSATA controller issue
instead of a bad sector on the drive - I replaced the motherboard,
but the problem persisted. So I bought a new hard drive and have
re-installed FreeBSD 7.2 on it. I'd like to plug in the old hard drive
today, mount it and salvage as much as I can... especially the
database files, config files, etc.

My question: what kind of checks and/or repair tools should I run on
the damaged drive after it's mounted? Or should I mount it as
read-only and start backing it up? I am hoping most of my data is
still there, but also don't want to damage it further. I desperately
need to salvage the data, what do the kind people on this list
recommend?

First, try copying the entire disk, *without* mounting it. Use dd(1) to
get a copy of the whole disk. I believe that "conv=noerror" may be necessary.

--
Lowell Gilbert, embedded/networking software engineer, Boston area
http://be-well.ilk.org/~lowell/
_______________________________________________
freebsd-questions@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • hard disk failure - now what?
    ... I just experienced a hard drive failure on one of my FreeBSD 7.2 ... Now it's a salvage operation. ... database files, config files, etc. ... read-only and start backing it up? ...
    (freebsd-questions)
  • Re: hard disk failure - now what?
    ... Now it's a salvage operation. ... mount it and salvage as much as I can... ... read-only and start backing it up? ... drive when I destroy drives. ...
    (freebsd-questions)
  • SU+J and fsck problem ?
    ... # fsck -F ... FREE BLOCK COUNTSWRONG IN SUPERBLK ... SALVAGE? ...
    (freebsd-current)
  • Re: fsck hangs my machine unpredictably
    ... > machine and skip fsck running automatically, try to mount the drive and ... > salvage some of the data to another drive? ... Edit /etc/fstab and add the keyword noauto to the entry for ...
    (Debian-User)