RE: Adaptec 3210S, 4.9-STABLE, corruption when disk fails

From: Don Bowman (don_at_SANDVINE.com)
Date: 04/01/05

  • Next message: Andriy Gapon: "Re: Kernel NTP flipping between FLL and PLL modes"
    Date: Fri, 1 Apr 2005 11:27:47 -0500
    To: "Uwe Doering" <gemini@geminix.org>
    
    

    From: Uwe Doering [mailto:gemini@geminix.org]
     ...
    > As far as I understand this family of controllers the OS
    > drivers aren't involved at all in case of a disk drive
    > failure. It's strictly the controller's business to deal
    > with it internally. The OS just sits there and waits until
    > the controller is done with the retries and either drops into
    > degraded mode or recovers from the disk error.
    >
    > That's why I initially speculated that there might be a
    > timeout somewhere in PostgreSQL or FreeBSD that leads to data
    > loss if the controller is busy for too long.
    >
    > A somewhat radical way to at least make these failures as
    > rare an event as possible would be to deliberately fail all
    > remaining old disk drives, one after the other of course, in
    > order to get rid of them. And if you are lucky the problem
    > won't happen with newer drives anyway, in case the root cause
    > is an incompatibility between the controller and the old drives.

    Started that yesterday. I've got one 'old' one left.
    Sadly, the one that failed night before last was not one of the
    'old' ones, so this is no guarantee :)

    >From the raidutil -e log, I see this type of info. I'm not sure
    what the 'unknown' events are. The 'CRC Failure' is probably the
    problem? There's also Bad SCSI Status, unit attention, etc.
    Perhaps the driver doesn't deal with these properly?

    $ raidutil -e d0
    03/31/2005 23:37:59 Level 1
    Lock for Channel 0 : Started

    03/31/2005 23:37:59 Level 1
    Lock for Channel 1 : Started

    03/31/2005 23:38:09 Level 1
    Lock for Channel 0 : Stopped

    03/31/2005 23:38:22 Level 1
    Lock for Channel 1 : Stopped

    03/31/2005 23:38:22 Level 4
    HBA=0 BUS=0 ID=0 LUN=0
    Status Change
    Optimal => Degraded - Drive Failed

    03/31/2005 23:38:22 Level 1
    Unknown Event : 56 10 00 08 EE 89 4C 42 00 00 00 00

    03/31/2005 23:38:22 Level 1
    CRC Failure
    Number of dirty blocks = -1
    FFFFFFFF D30A1F2A 00000000 00000000 00000000 00000000 00000000 00000000
    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

    03/31/2005 23:38:24 Level 3
    HBA=0 BUS=0 ID=0 LUN=0
    Bad SCSI Status - Check Condition
    28 00 00 00 00 00 00 00 01 00 00 00

    03/31/2005 23:38:24 Level 3
    HBA=0 BUS=0 ID=0 LUN=0
    Request Sense
    70 00 06 00 00 00 00 0A 00 00 00 00 29 02 02 00 00 00
    Unit Attention

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  • Next message: Andriy Gapon: "Re: Kernel NTP flipping between FLL and PLL modes"

    Relevant Pages

    • Re: Adaptec 3210S, 4.9-STABLE, corruption when disk fails
      ... >>failure. ... >>loss if the controller is busy for too long. ... between the controller and the disk drives. ... The OS driver is not ...
      (freebsd-stable)
    • Re: Bulletproof backup - how to test?
      ... bad controller and/or bad RAM ... failure of both drives at the same time could cause data loss. ...
      (microsoft.public.windowsxp.general)
    • Re: Alpha 1200 BIOS
      ... I suggest you hook the drives up to a PC and run a program on the PC to ... SRM console, you need to use a Serial Terminal, something above a VT320 ... It is a SCSII controller. ... >>numbers of the disk drives? ...
      (comp.sys.dec)
    • Re: RAID 5 Multiple Hard-drives failure
      ... > other than the hard drive itself that would cause hard-drives to fail ... Or is it just Maxtor makes bad drives? ... cause of failure in my case. ... drive that fails, but the controller, rendering the disks useless. ...
      (Fedora)
    • Re: ST506 and/or ST412 hard disk drives
      ... The controller only supports two kinds of hard disk ... drives, the ST506 and ST412. ... If anyone has one or more of these hard disk drives they would like to ...
      (comp.os.cpm)