WRITE_DMA48 error causing loss of ZFS array



I've experienced several dma errors over the past few months with various incarnations of 7.0 which were all fixed.
Seems I have a new one. Don't know if there was a connection, but this only occured after updating to 7.0-BETA1 last weekend.
I have a small ufs mirror for /boot and everything else on one ZFS pool.
I scrub my zpool in the early hours every monday morning. Last Monday when I got to the console I saw DMA_ERRORs slowly scrolling up the screen. Could type 'root' to login prompt on virtual terminal but it just hung. Nothing I could do apart from reset.
When it came back it was fine AFAICT. Later that day I got the problem again. Reset and all ok. I then, confusingly, managed to successfully scrub the whole pool with no problems.
However, again this morning I had the same symptoms. A couple of screenshots here, as nothing got logged, the pool seemed to be effectively unavailable:

http://webhost.salford.ac.uk/aix502/29102007(001).thb.jpg
http://webhost.salford.ac.uk/aix502/29102007(004).thb.jpg

The errors all seemto be on one drive. AFAICT it had probably been going on for hours when I get to it and seems like it will continue this way forever.
I've looked in the smartctl output for the drive (I do a short offline test everyday and a long offline test every Sunday) but nothing there. Ran the Hitachi Drive Fitness test on the drive and no problems reported.
This is one of two drives on a JMB363 controller which is in IDE mode. If that makes a difference, as I've seen posts referring to problems with that controller, but think they might've been dealing with AHCI mode only?
Is this a known problem? I've seen mention of known problems with ata, but it's hard to get a clear picture of what is currently outstanding from searching the last few month's -current.
Also, why do I lose my zpool and have to reset? This one drive failing would not cause a problem for the zpool, as it has redundancy. However, why am I effectively losing the whole pool due to this error?
I'll be glad to provide any more info.
Many thanks in advance.

--
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837 Fax: +44 161 295 5888 www.pgp.com for PGP key
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: ZFS pool not working on boot
    ... The other is an external storage array with 11 drives ... Also it seems like from reading ZFS docs that the symptoms would be that the pool would simply need to be imported again if the host id changed? ... Attach the missing device and online it using 'zpool online'. ... # zpool export tank ...
    (freebsd-current)
  • /dev/dsk entry creation
    ... Added the multidisk pack and am using that to create zfs pool. ... Noticed that in /dev/dsk three of the drives in the pack ... doesn't like that the entry is not in /dev/dsk. ...
    (comp.sys.sun.admin)
  • Re: Bad blocks not getting remapped on Maxtor drive
    ... > I've been trying to fix a problem with bad blocks on a fairly new Maxtor ... I think modern drives keep a portion of the disk as a pool to map for bad ... that means the pool is depleted. ... arrived with a note that said they would trade this for a 4.0 G if I wanted. ...
    (comp.os.linux.hardware)
  • RE: SATA drivers for the XPS410/Dimension 9200 -- no floppy.
    ... SATA drivers for the XPS410/Dimension 9200 -- no floppy. ... the WHS machine will have: ... drives that will get USB cases and become drives in the backup pool ...
    (alt.sys.pc-clone.dell)
  • Re: Kaypro 10 and KayPLUS ROM Upgrade
    ... likes the reset HIGH and the other the reset LOW which was partly ... had one floppy, ... floppy drives if you need them. ... special K10 version for both the HIGH and LOW active reset signals. ...
    (comp.os.cpm)