FYI: Promise TX4 silent corruption (RELENG_7)



Hello,

The following is not a request for help or bug report as such, I just
want to put the information out there in case it helps other people by
encouraging active checking for silent data corruption (also happens
to be a good "saved yet again by ZFS" story).

I was moving some disks to a machine that didn't have SATA ports for
them, so I took one of the TX4:s I had left over since some previous
desperate attempts to get working SATA in another machine.

At first, things *seemed* good. No DMA timeouts or anything like
that. Streamed through 4 250 gig drives no problem; ran a bunch of
rsyncs of ports trees during the night.

However, once I started dd:ing large files and reading them back in I
started getting I/O errors from ZFS, because of checksum
mismatches. Turns out all the drives connected to the TX4 in the
raidz2 were generating checksum errors (the one that was not connected
to the TX4 was fine). Write a 2-3 gig file of zeroes -> handful of
checksum mismatches on subsequent scrub.

Since then I have now tried two distinct TX4 cards (but only in one
PCI slot). Both suffer from the same problem. Amazingly a SiI 3114
*does* seem to work in the same PCI slot - no corruption, and no DMA
timeouts and whatnot that I was expecting from a SiI card.

This was on amd64, RELENG_7. Same SATA cables used on all
drives. Drives on TX4 were also in a Supermicro hotswap enclosure,
which may or may not be related (but again, no problem with SiI).

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@xxxxxxxxxxxx>'
Key retrieval: Send an E-Mail to getpgpkey@xxxxxxxxx
E-Mail: peter.schuller@xxxxxxxxxxxx Web: http://www.scode.org

Attachment: pgpnhM1hWk0rt.pgp
Description: PGP signature



Relevant Pages

  • Re: libATA SATA errors on DVD bad sectors...
    ... Later you write that ata12 connects to a Promise SATA300 TX4. ... The obvious next experiment would be to swap the drives and see how ... the Lite-On behaves on the ICH9 and the LG behaves on the TX4. ... sector' error condition and not perform a hard reset? ...
    (Linux-Kernel)
  • Re: Intermittent read errors from SATA HD.
    ... > I've tried swapping cables and updating drives, and I've got to the point ... > chunks and compares the new checksum with the old one. ... An App or some API calls that let me set the size of the read-cache to ...
    (microsoft.public.windowsxp.general)
  • Re: Best practice for bulk data transfer
    ... slow server is identical to file x on faster server. ... pull hard drives off slow_server and place in removable tray on ... drive, write the checksum to another CD, and run the checksum comparisons ...
    (Fedora)
  • How to install to Sata Drive with Promise Tx4 installer? etch or lenny
    ... that resides on a Promise Sata Tx4 4 channel controller. ... In the past I have installed debian on PATA IDE drives for system and used ... sata drives only for storage. ... I want to avoid having to install first on a pata drive and then mirror the install to the ...
    (Debian-User)
  • 5.3-RC1 problem with SiI 3114 (Tyan S2882 in i386 mode) with some disks
    ... based RAIDs are the main boot and used disks. ... onboard SiI 3114 was not working right at the time I had last updated ... the Raptor and one of the 200GB drives was hooked to a Promise SATA ... card and the last WD card was hooked to the SiI 3114 but not used due ...
    (freebsd-current)

Loading