Re: RAID and NFS exports (Possible Data Corruption)
From: Sumit Shah (shah_at_ucla.edu)
Date: 07/15/03
- Previous message: Martin: "Re: USB device programming with ugen [Solved]"
- In reply to: David Malone: "Re: RAID and NFS exports (Possible Data Corruption)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Date: Tue, 15 Jul 2003 13:59:08 -0700 To: David Malone <dwmalone@maths.tcd.ie>
Thanks for the reply.
>> ad4: hard error reading fsbn 242727552
>
> The error means that that the disk said that there was an error
> trying to read this block. You say that when you rebooted that the
> controler said a disk had gone bad, so this would sort of confirm
> this. (I could believe that restarting mountd might upset raid stuff
> if there were a kernel bug, but it seems very unlikely it could
> cause a disk to go bad.)
The full error was something like this on _both_ of the identical
systems, even _before_ the reboot. After this message we could not
read/write/fsck /dev/ar0
ad7: hard error reading fsbn 291786506 of 0-127 (ad7 bn 291786506; cn
289470 tn 11 sn 53) trying PIO
mode
ad7: DMA problem fallback to PIO mode
ad7: DMA problem fallback to PIO mode
ad7: DMA problem fallback to PIO mode
ad7: DMA problem fallback to PIO mode
ad7: DMA problem fallback to PIO mode
ad7: hard error reading fsbn 291786586 of 0-127 (ad7 bn 291786586; cn
289470 tn 13 sn 7) status=59 e
rror=40
ar0: ERROR - array broken
There was also a variety of messages like these:
Jul 14 02:55:39 thorimage1 /kernel: ad7: hard error reading fsbn
291786586 of 0-127 (ad7 bn 291786586; cn 289470 tn 13 sn 7) status=59
error=40
where ad7: .... included any of the 6 devices, somewhat randomly, in
the array.
>
> My best guess would be that you have a bad batch of disks that
> happen to have failed in similar ways. It is possible that restarting
> mountd uncovered the errors, 'cos I think mountd internally does
> a remount of the filesystem in question and that might cause a chunk
> of stuff to be flushed out on to the disk, highlighting an error.
>
> (I had a bunch of the IBM "deathstar" disks fail on me within the
> space of a week or so, after they'd been in use for about six
> months.
That certainly sounds reasonable that this problem had just manifested
itself by restarting mountd. It's just strange and too much of a
coincidence that two sets of six disks on two different but identical
machines would fail exactly the same way within an hour. I guess given
the decline of quality in hard drives things like this might be more
likely.
Thanks,
Sumit
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
- Previous message: Martin: "Re: USB device programming with ugen [Solved]"
- In reply to: David Malone: "Re: RAID and NFS exports (Possible Data Corruption)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Relevant Pages
|