FreeBSD PseudoRAID RAID0 array broken on atapci1: <Intel ICH5 SATA150 controller>



Hi, I need some help recovering from this. First some back story. Running 6.2-STABLE i386 from Sep 17, 2007. My /home slice is mounted from /dev/ar0s1e where the relevant kernel messages look like so when all is good:

atapci1: <Intel ICH5 SATA150 controller>
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
ad4: 381554MB <WDC WD4000YR-01PLB0 01.06A01> at ata2-master SATA150
ad6: 381554MB <WDC WD4000YR-01PLB0 01.06A01> at ata3-master SATA150
ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: READY
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad6 at ata3-master

Today this server crashed with the following loggeed:

ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=144888320
ad4: TIMEOUT - READ_DMA retrying (1 retry left) LBA=143390319
ad4: FAILURE - device detached
ar0: FAILURE - RAID0 array broken
subdisk4: detached
ad4: detached
g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
initiate_write_filepage: already started
g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6144000, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6160384, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6176768, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6193152, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6209536, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=65536, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=147801325568, length=12288)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=147142686720, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=65536, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6144000, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6160384, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6176768, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6193152, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=6209536, length=2048)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=146831867904, length=16384)]error = 5
g_vfs_done():ar0s1e[WRITE(offset=147024330752, length=16384)]error = 5
initiate_write_filepage: already started
g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
initiate_write_filepage: already started
g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
initiate_write_filepage: already started
g_vfs_done():ar0s1e[WRITE(offset=147801325568, length=12288)]error = 5
initiate_write_filepage: already started
g_vfs_done():ar0s1e[WRITE(offset=147142686720, length=2048)]error = 5

Now the kernel messages read:

ar0: FAILURE - RAID0 array broken
ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: BROKEN
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 DOWN no device found for this subdisk
ar1: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: BROKEN
ar1: disk0 DOWN no device found for this subdisk
ar1: disk1 READY using ad6 at ata3-master

For some reason the second disk in the array shows up as ar1 instead of being part of ar0. I suspect there's gotta be some way to force the two drives to show up as part of the same array by perhaps editing the PseudoRAID metadata on disk without putting any of the UFS2 data in "jeopardy". Any pointers on where to start poking around for the relevant metadata structures on disk or what to search for? I figure if I can dd the metadata off the disks, tweak a field or two and then dd the whole mess back I stand a chance of either hosing the array irrevocably or getting it all back. ;) Or maybe atacontrol could be used to re-create the metadata without destroying the UFS2 on the array? I have a coredump of the kernel from this crash if that helps analyze things any.

--
Yarema
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: FreeBSD PseudoRAID RAID0 array broken on atapci1:
    ... FAILURE - RAID0 array broken ... Now the kernel messages read: ... the PseudoRAID metadata on disk without putting any of the UFS2 data ...
    (freebsd-stable)
  • Re: Re-post in separate thread: P5N-E; trouble enabling esata
    ... may indicate the status of the disk. ... RAID array is declared, ... the software will not write any metadata. ...
    (alt.comp.periphs.mainboard.asus)
  • Re: Need feedback on the A5200 storage array....
    ... they don't have the money for a big Hitachi array or a fast FC array with ... Use RAID5 on that kind of hardware. ... ten years or so) that had internal RAID5 controllers. ... I can't simply yank a disk and read its ...
    (comp.unix.solaris)
  • Re: HP EVA4000 / IBM DS4300 / EMC CX3-20/40
    ... Both EMC and EVA are great arrays and they will serve you well. ... disk array with the virtual raidsets on top. ... So, the system admin, and the DBAs had to create and manage lots of ...
    (comp.arch.storage)
  • Re: RAID 5 corruption, RAID 1 more stable?
    ... corruption to either the RAID array itself or the file system. ... The disk array to suffer so many errors (for example disk errors ... There is nothing the disk array can do if the host is broken and ...
    (comp.arch.storage)