Re: FreeBSD + ZFS on a production server?



On Mon, 9 Jun 2008 23:31:35 +0200 (CEST),
Wojciech Puchar <wojtek@xxxxxxxxxxxxxxxxxxxxxxx> said:

W> but why you need [a filesystem for linux that do checksum on the fly]?! all
W> PATA/SATA drives do checksumming on every read. in hardware, no CPU load.

These days, hardware isn't just hardware. A disk drive can have around
300,000 lines of low-level firmware, and who wants to bet that it's
completely bug-free? Silent-write errors are actually a big problem:

http://www.usenix.org/publications/login/2008-06/openpdfs/bairavasundaram.pdf
An Analysis of Data Corruption in the Storage Stack

"In this paper, we present the first large-scale study of data corruption.
We analyze corruption instances recorded in production storage systems
containing a total of 1.53 million disk drives, over a period of 41 months.
We study three classes of corruption: checksum mismatches, identity
discrepancies, and parity inconsistencies. We focus on checksum mismatches
since they occur the most; more than 400,000 instances of checksum
mismatches over the 41-month period."

--
Karl Vogel I don't speak for the USAF or my company

Mangled song lyric: Looks like tomatoes
Actual lyric: Looks like we made it. (Barry Mannilow)
_______________________________________________
freebsd-questions@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • FYI: Promise TX4 silent corruption (RELENG_7)
    ... Turns out all the drives connected to the TX4 in the ... raidz2 were generating checksum errors (the one that was not connected ... timeouts and whatnot that I was expecting from a SiI card. ...
    (freebsd-stable)
  • Re: Intermittent read errors from SATA HD.
    ... > I've tried swapping cables and updating drives, and I've got to the point ... > chunks and compares the new checksum with the old one. ... An App or some API calls that let me set the size of the read-cache to ...
    (microsoft.public.windowsxp.general)
  • Re: Best practice for bulk data transfer
    ... slow server is identical to file x on faster server. ... pull hard drives off slow_server and place in removable tray on ... drive, write the checksum to another CD, and run the checksum comparisons ...
    (Fedora)
  • Re: installation boot failure
    ... However on boot I ... ROM Checksum read error. ... > and once I'm in the install utility, I can't see my SCSI hard drives. ... The SCSI adapter is intended for WindowsNT. ...
    (comp.os.vms)
  • RE: A little story of failed raid5 (3ware 8000 series)
    ... what is the checksum for exactly? ... for detecting data corruption, so if the card isn't using the ... Checking for data corruption is done, by checking if the *DATA* is corrupt. ... The RAID 5 checksum isn't for verifying the data, it's for recovering the data if it can't be read. ...
    (freebsd-stable)

Quantcast