WRITE_DMA48 error causing loss of ZFS array
- From: "Mark Powell" <M.S.Powell@xxxxxxxxxxxxx>
- Date: Mon, 29 Oct 2007 09:59:11 +0000 (GMT)
I've experienced several dma errors over the past few months with various incarnations of 7.0 which were all fixed.
Seems I have a new one. Don't know if there was a connection, but this only occured after updating to 7.0-BETA1 last weekend.
I have a small ufs mirror for /boot and everything else on one ZFS pool.
I scrub my zpool in the early hours every monday morning. Last Monday when I got to the console I saw DMA_ERRORs slowly scrolling up the screen. Could type 'root' to login prompt on virtual terminal but it just hung. Nothing I could do apart from reset.
When it came back it was fine AFAICT. Later that day I got the problem again. Reset and all ok. I then, confusingly, managed to successfully scrub the whole pool with no problems.
However, again this morning I had the same symptoms. A couple of screenshots here, as nothing got logged, the pool seemed to be effectively unavailable:
http://webhost.salford.ac.uk/aix502/29102007(001).thb.jpg
http://webhost.salford.ac.uk/aix502/29102007(004).thb.jpg
The errors all seemto be on one drive. AFAICT it had probably been going on for hours when I get to it and seems like it will continue this way forever.
I've looked in the smartctl output for the drive (I do a short offline test everyday and a long offline test every Sunday) but nothing there. Ran the Hitachi Drive Fitness test on the drive and no problems reported.
This is one of two drives on a JMB363 controller which is in IDE mode. If that makes a difference, as I've seen posts referring to problems with that controller, but think they might've been dealing with AHCI mode only?
Is this a known problem? I've seen mention of known problems with ata, but it's hard to get a clear picture of what is currently outstanding from searching the last few month's -current.
Also, why do I lose my zpool and have to reset? This one drive failing would not cause a problem for the zpool, as it has redundancy. However, why am I effectively losing the whole pool due to this error?
I'll be glad to provide any more info.
Many thanks in advance.
--
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837 Fax: +44 161 295 5888 www.pgp.com for PGP key
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- Re: WRITE_DMA48 error causing loss of ZFS array
- From: Jeremy Chadwick
- Re: WRITE_DMA48 error causing loss of ZFS array
- Prev by Date: Re: tmpfs on contemporary -current: panic: locked against myself
- Next by Date: [head tinderbox] failure on amd64/amd64
- Previous by thread: Re: tmpfs on contemporary -current: panic: locked against myself
- Next by thread: Re: WRITE_DMA48 error causing loss of ZFS array
- Index(es):
Relevant Pages
|
|