Re: filesystem corruption with 1TB filesystem, 4.9-STABLE, twe

From: Doug White (dwhite_at_gumbysoft.com)
Date: 04/30/04

  • Next message: Cordula's Web: "dump -P on RELENG_4?"
    Date: Fri, 30 Apr 2004 10:27:53 -0700 (PDT)
    To: Ollie Cook <ollie@uk.clara.net>
    
    

    On Sun, 18 Apr 2004, Ollie Cook wrote:

    > I am experiencing filesystem corruption while using a 1TB (appx.) partition
    > under 4.9-STABLE (sources from Mar 17) and an 8-port 3ware ATA RAID card (twe
    > device driver). The RAID set comprises 5x250GB ATA disks.

    [...]

    The type of corruption you're seeing would be consistent with one of the
    disks not accepting writes or some other sort of array corruption. I
    realize it'll take forever, but can you run an array verify? I wonder if
    the BIOS isn't picking up a disk failure since it isn't throwing errors,
    but isn't doing any useful work either.

    >
    > The kernel logs such messages as:
    >
    > Apr 17 16:25:37 heman /kernel: free inode /clara/170175645 had 137391860 blocks
    > Apr 17 17:18:29 heman /kernel: free inode /clara/169969279 had 1803039330 blocks
    > Apr 17 18:06:38 heman /kernel: free inode /clara/171086221 had 544501359 blocks
    >
    > The operations it was performing at the time involved copying a lot of small
    > (email messages) files from a busy NFS mount to the RAID5 array. A number of
    > processes were all copying different files and the throughput was around 3MB/s
    > to disk.
    >
    > As far as I can tell from sys/ufs/ffs/ffs_alloc.c this error indicates that a
    > kernel data structure contains unexpected data, but I'm not confident enough to
    > be able to tell what might be causing that.
    >
    > After such messages, if I cleanly unmount the filesystem and run fsck, errors
    > are detected. Such errors are:
    >
    > directory corrupted
    > directory contains empty blocks
    > unallocated inode
    > wrong link counts
    >
    > There are many more distinct error messages, but those are the ones I recall.
    > After a number of passes through fsck, the filesystem is eventually marked
    > clean but quite a number of files wind up in lost+found.
    >
    > Has anyone seen behaviour similar to this with twe RAID sets or large
    > partitions in the past? I've not been able to find reports of similar symptoms
    > using Google.
    >
    > Can anyone offer advice on how I might further debug this problem?
    >
    > Yours,
    >
    > Ollie
    >
    > Apr 16 11:34:12 heman /kernel: twe0: <3ware Storage Controller> port 0xc800-0xc80f mem 0xfe000000-0xfe7fffff,0xfe8ffc00-0xfe8ffc0f irq 10 at device 4.0 on pci3
    > Apr 16 11:34:12 heman /kernel: twe0: 8 ports, Firmware FE7X 1.05.00.065, BIOS BE7X 1.08.00.048
    > Apr 16 11:34:12 heman /kernel: twed0: <Unit 0, JBOD, Normal> on twe0
    > Apr 16 11:34:12 heman /kernel: twed0: 4126MB (8452080 sectors)
    > Apr 16 11:34:12 heman /kernel: twed1: <Unit 1, RAID5, Normal> on twe0
    > Apr 16 11:34:12 heman /kernel: twed1: 953896MB (1953580032 sectors)
    > Apr 16 11:34:12 heman /kernel: twe0: command interrupt
    >
    >

    -- 
    Doug White                    |  FreeBSD: The Power to Serve
    dwhite@gumbysoft.com          |  www.FreeBSD.org
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
    

  • Next message: Cordula's Web: "dump -P on RELENG_4?"

    Relevant Pages

    • A sorry saga
      ... is currently in some kind of busy loop. ... They plugged it back in within a minute or so, and the Alpha VMS cluster survived. ... Subsequently though we started experiencing disk corruption - multiply allocated blocks, corrupted index file blocks, and other issues. ... I restored 3 disks from Friday night's backup, and last night all appeared well. ...
      (comp.os.vms)
    • Re: [opensuse] Re: [OT] vmware and fake scsi devs
      ... perform best), but, you're setting yourself up for disaster using LVM ... (any corruption to the LVM layer is not recoverable... ... MD RAID5/6 devices can be grown (add more disks). ... with less-than-linear scalability... ...
      (SuSE)
    • Re: A sorry saga
      ... I can kill the offending process (always clocking vast numbers of direct I/Os, as well as CPU), but the 'problem' transfers to another process. ... Subsequently though we started experiencing disk corruption - multiply allocated blocks, corrupted index file blocks, and other issues. ... I restored 3 disks from Friday night's backup, and last night all appeared well. ...
      (comp.os.vms)
    • Re: data corruption with nvidia chipsets and IDE/SATA drives
      ... I'm also experiencing silent data corruption on writes to SATA disks ... connected to a Nvidia controller. ... I have confirmed the corruption is occurring on the writes and not the ... still cached in memory no corruption is found. ...
      (Linux-Kernel)
    • Re: Question for those who are using NL location recorders.
      ... > That would seem to indicate that file corruption would be most likely on the ... with disks those problems have been obvious and dramatic --- data cannot be ... > from a poly Bwav file for transfer, and then send the DVD along to sound ... possibility of a facility operator combining tracks improperly or delivering ...
      (rec.arts.movies.production.sound)