Re: RAID and NFS exports (Possible Data Corruption)

From: Sumit Shah (shah_at_ucla.edu)
Date: 07/15/03

  • Next message: John Baldwin: "Re: USB device programming with ugen [Solved]"
    Date: Tue, 15 Jul 2003 13:59:08 -0700
    To: David Malone <dwmalone@maths.tcd.ie>
    
    

    Thanks for the reply.

    >> ad4: hard error reading fsbn 242727552
    >
    > The error means that that the disk said that there was an error
    > trying to read this block. You say that when you rebooted that the
    > controler said a disk had gone bad, so this would sort of confirm
    > this. (I could believe that restarting mountd might upset raid stuff
    > if there were a kernel bug, but it seems very unlikely it could
    > cause a disk to go bad.)

    The full error was something like this on _both_ of the identical
    systems, even _before_ the reboot. After this message we could not
    read/write/fsck /dev/ar0

    ad7: hard error reading fsbn 291786506 of 0-127 (ad7 bn 291786506; cn
    289470 tn 11 sn 53) trying PIO
      mode
    ad7: DMA problem fallback to PIO mode
    ad7: DMA problem fallback to PIO mode
    ad7: DMA problem fallback to PIO mode
    ad7: DMA problem fallback to PIO mode
    ad7: DMA problem fallback to PIO mode
    ad7: hard error reading fsbn 291786586 of 0-127 (ad7 bn 291786586; cn
    289470 tn 13 sn 7) status=59 e
    rror=40
    ar0: ERROR - array broken

    There was also a variety of messages like these:
    Jul 14 02:55:39 thorimage1 /kernel: ad7: hard error reading fsbn
    291786586 of 0-127 (ad7 bn 291786586; cn 289470 tn 13 sn 7) status=59
    error=40

    where ad7: .... included any of the 6 devices, somewhat randomly, in
    the array.

    >
    > My best guess would be that you have a bad batch of disks that
    > happen to have failed in similar ways. It is possible that restarting
    > mountd uncovered the errors, 'cos I think mountd internally does
    > a remount of the filesystem in question and that might cause a chunk
    > of stuff to be flushed out on to the disk, highlighting an error.
    >
    > (I had a bunch of the IBM "deathstar" disks fail on me within the
    > space of a week or so, after they'd been in use for about six
    > months.

    That certainly sounds reasonable that this problem had just manifested
    itself by restarting mountd. It's just strange and too much of a
    coincidence that two sets of six disks on two different but identical
    machines would fail exactly the same way within an hour. I guess given
    the decline of quality in hard drives things like this might be more
    likely.

    Thanks,
    Sumit

    _______________________________________________
    freebsd-hackers@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
    To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"


  • Next message: John Baldwin: "Re: USB device programming with ugen [Solved]"

    Relevant Pages

    • Re: Hard Disk Performance
      ... > Every night each system does a backup to a single disk file (Using Backup ... > solved by installing a new SCO disk driver (rather than the old one which is ... setup and see if it shows you anything about the chosen PIO mode. ...
      (comp.unix.sco.misc)
    • Re: DMA reverted to PIO
      ... That disk started going bad so I decided it was time to buy a new 250GB ... This has now set the Transfer mode for the Primary IDE controller to PIO mode. ... The drive I want DMA mode back on is the main system disk. ...
      (microsoft.public.windowsxp.general)
    • FYI: Restoring DMA access on an "IDE ATA/ATAPI controller"
      ... http://support.microsoft.com/kb/817472 ("IDE ATA and ATAPI disks use ... PIO mode after multiple time-out or CRC errors occur") did not give me ... I ran the vbscript -- with a little trepidation, but my hard disk, ... On reboot I found ...
      (microsoft.public.windowsxp.general)
    • Re: DMA reverted to PIO
      ... I've had an old 60GB harddisk in this computer for while. ... It left a many disk ... > This has now set the Transfer mode for the Primary IDE controller to> PIO ... > to the PIO mode on that IDE controller. ...
      (microsoft.public.windowsxp.general)
    • RE: Disk error messages (ad0: HARD READ ERROR blk# xxxxxx)
      ... >> replace the disk which is a relatively new disk (1-2 ... DMA problem fallback to PIO mode ... > Seatools Desktop edition to CDROM and boot from it. ... Bristol Systems Inc. ...
      (freebsd-questions)