Re: filesystem corruption with 1TB filesystem, 4.9-STABLE, twe

From: Matthew Reimer (mreimer_at_vpop.net)
Date: 04/30/04


Date: Fri, 30 Apr 2004 12:52:53 -0500
To: stable@freebsd.org

Is your card plugged into a riser card? We had similar problems (random
corruption) with a 7506-8 card. The workaround was to set the speed for
that PCI slot to 33MHz (rather than Auto or 66MHz). I think this tech
note describes our problem:

http://www.3ware.com/kb/article.aspx?id=10848

(Read the PDF file attached to the tech note.)

Now the box is as solid as a rock.

Matt

Doug White wrote:
> On Sun, 18 Apr 2004, Ollie Cook wrote:
>
>
>>I am experiencing filesystem corruption while using a 1TB (appx.) partition
>>under 4.9-STABLE (sources from Mar 17) and an 8-port 3ware ATA RAID card (twe
>>device driver). The RAID set comprises 5x250GB ATA disks.
>
>
> [...]
>
> The type of corruption you're seeing would be consistent with one of the
> disks not accepting writes or some other sort of array corruption. I
> realize it'll take forever, but can you run an array verify? I wonder if
> the BIOS isn't picking up a disk failure since it isn't throwing errors,
> but isn't doing any useful work either.
>
>
>
>>The kernel logs such messages as:
>>
>>Apr 17 16:25:37 heman /kernel: free inode /clara/170175645 had 137391860 blocks
>>Apr 17 17:18:29 heman /kernel: free inode /clara/169969279 had 1803039330 blocks
>>Apr 17 18:06:38 heman /kernel: free inode /clara/171086221 had 544501359 blocks
>>
>>The operations it was performing at the time involved copying a lot of small
>>(email messages) files from a busy NFS mount to the RAID5 array. A number of
>>processes were all copying different files and the throughput was around 3MB/s
>>to disk.
>>
>>As far as I can tell from sys/ufs/ffs/ffs_alloc.c this error indicates that a
>>kernel data structure contains unexpected data, but I'm not confident enough to
>>be able to tell what might be causing that.
>>
>>After such messages, if I cleanly unmount the filesystem and run fsck, errors
>>are detected. Such errors are:
>>
>> directory corrupted
>> directory contains empty blocks
>> unallocated inode
>> wrong link counts
>>
>>There are many more distinct error messages, but those are the ones I recall.
>>After a number of passes through fsck, the filesystem is eventually marked
>>clean but quite a number of files wind up in lost+found.
>>
>>Has anyone seen behaviour similar to this with twe RAID sets or large
>>partitions in the past? I've not been able to find reports of similar symptoms
>>using Google.
>>
>>Can anyone offer advice on how I might further debug this problem?
>>
>>Yours,
>>
>>Ollie
>>
>>Apr 16 11:34:12 heman /kernel: twe0: <3ware Storage Controller> port 0xc800-0xc80f mem 0xfe000000-0xfe7fffff,0xfe8ffc00-0xfe8ffc0f irq 10 at device 4.0 on pci3
>>Apr 16 11:34:12 heman /kernel: twe0: 8 ports, Firmware FE7X 1.05.00.065, BIOS BE7X 1.08.00.048
>>Apr 16 11:34:12 heman /kernel: twed0: <Unit 0, JBOD, Normal> on twe0
>>Apr 16 11:34:12 heman /kernel: twed0: 4126MB (8452080 sectors)
>>Apr 16 11:34:12 heman /kernel: twed1: <Unit 1, RAID5, Normal> on twe0
>>Apr 16 11:34:12 heman /kernel: twed1: 953896MB (1953580032 sectors)
>>Apr 16 11:34:12 heman /kernel: twe0: command interrupt
>>
>>
>
>

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"



Relevant Pages

  • Valid SSA configuration in a p660?
    ... RAID5 array with no hot spares. ... raid5 array on it's original controller card "ssa0" comes online just ... I try to do the same with the 4 disks on the new ssa2 card, ...
    (comp.unix.aix)
  • Re: 3ware controller for SATA works - solves problem with SII chipset
    ... > RAID features of the card? ... Not all of the disks in the array are on the 3ware card - I have two SATA ...
    (freebsd-stable)
  • Re: RAID: Hardware or Software?
    ... software array via Linux -- namely that the card could die (or die and ... be obsoleted, with no replacement card available), leaving the array ... two RAID cards. ... you to hook up the disks to normal controllers and costs money. ...
    (comp.sys.ibm.pc.hardware.storage)
  • A sorry saga
    ... is currently in some kind of busy loop. ... They plugged it back in within a minute or so, and the Alpha VMS cluster survived. ... Subsequently though we started experiencing disk corruption - multiply allocated blocks, corrupted index file blocks, and other issues. ... I restored 3 disks from Friday night's backup, and last night all appeared well. ...
    (comp.os.vms)
  • Re: [opensuse] Re: [OT] vmware and fake scsi devs
    ... perform best), but, you're setting yourself up for disaster using LVM ... (any corruption to the LVM layer is not recoverable... ... MD RAID5/6 devices can be grown (add more disks). ... with less-than-linear scalability... ...
    (SuSE)