MegaRAID 'Bad Slot' Kernel message and crash.

From: Tony Byrne (tonyb_at_byrnehq.com)
Date: 01/11/05

  • Next message: Gareth Hopkins: "MIT Kerberos and OpenSSH"
    Date: Tue, 11 Jan 2005 13:40:14 +0000
    To: freebsd-stable@freebsd.org
    
    

    Folks,

    I kicked off a thread just before the holidays regarding some problems
    we are having with an Intel SRCU42X RAID controller in a dual
    processor production server originally under 5.3-STABLE and now
    under 4.10-STABLE. The thread ran out of steam, with no resolution to
    the problem, but I'm hoping that with extra information I might get to
    the bottom of it.

    Basically, after some amount of uptime the kernel will emit a "amr0:
    Bad slot x completed" message and pretty soon after this the box goes into a
    partially unresponsive state forcing us to reboot it. So far the only
    thing triggering the problem is the nightly jobs, where the amount of
    IO is higher than during the day.

    Before deployment, we tested the box with 5.3-STABLE and managed to
    trigger the problem twice. This forced us to try 4.10-STABLE which
    was fine in testing and for a number of weeks after deployment.
    However, just before new year we saw our first Bad Slot and crash under
    4.10. Since then it has happened 3 more times. We have upgraded the firmware to
    the latest version available from Intel, and if anything this has made
    the problem worse.

    We're beginning to suspect a dud card but could do with a few "works
    fine for us" style posts to build confidence in the support for the
    card under FreeBSD. The amr driver doesn't explicitly support the
    card, but it's a rebadged MegaRAID 320 as far as we can tell.

    Scott Long has posted to say that he is seeing similar problems,
    but I'm wondering if it really is a problem with the driver, wouldn't
    more of you be having problems?

    The machine had 3 disks configured as a single RAID5 array. A fourth
    disk is configured as a hot-standby. The card is equipped with 128Mb
    of battery-backed cache. Write-back caching is enabled on the card.
    Read-ahead caching is enabled in non-adaptive mode.

    Is anyone else using a SRCU42X RAID card and seeing similar
    problems to ours? What about other cards supported by the amr driver?

    We could just change the controller, but the problem we are having is
    pretty random and the feedback gap between change and outcome is long.
    We'd like to have more information to work with before deciding the
    next step.

    uname -a
    FreeBSD xxxxx 4.10-STABLE FreeBSD 4.10-STABLE #7: Tue Nov 16 12:50:42 GMT 2004

    dmesg
    Copyright (c) 1992-2004 The FreeBSD Project.
    Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
            The Regents of the University of California. All rights reserved.
    FreeBSD 4.10-STABLE #7: Tue Nov 16 12:50:42 GMT 2004
        dermot@pooh.traveldev.com:/usr/obj/usr/src/sys/POOH
    Timecounter "i8254" frequency 1193182 Hz
    CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3189.72-MHz 686-class CPU)
      Origin = "GenuineIntel" Id = 0xf25 Stepping = 5
      Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
      Hyperthreading: 2 logical CPUs
    real memory = 4026466304 (3932096K bytes)
    Programming 24 pins in IOAPIC #0
    IOAPIC #0 intpin 2 -> irq 0
    Programming 24 pins in IOAPIC #1
    Programming 24 pins in IOAPIC #2
    FreeBSD/SMP: Multiprocessor motherboard: 4 CPUs
     cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000
     cpu1 (AP): apic id: 1, version: 0x00050014, at 0xfee00000
     cpu2 (AP): apic id: 6, version: 0x00050014, at 0xfee00000
     cpu3 (AP): apic id: 7, version: 0x00050014, at 0xfee00000
     io0 (APIC): apic id: 8, version: 0x00178020, at 0xfec00000
     io1 (APIC): apic id: 9, version: 0x00178020, at 0xfec81000
     io2 (APIC): apic id: 10, version: 0x00178020, at 0xfec81400
    Preloaded elf kernel "kernel" at 0xc03cc000.
    Preloaded userconfig_script "/boot/kernel.conf" at 0xc03cc09c.
    Warning: Pentium 4 CPU: PSE disabled
    Pentium Pro MTRR support enabled
    md0: Malloc disk
    Using $PIR table, 19 entries at 0xc00f3630
    npx0: <math processor> on motherboard
    npx0: INT 16 interface
    pcib0: <Host to PCI bridge> on motherboard
    IOAPIC #0 intpin 16 -> irq 2
    IOAPIC #0 intpin 19 -> irq 16
    pci0: <PCI bus> on pcib0
    pci0: <unknown card> (vendor=0x8086, dev=0x2541) at 0.1
    pcib1: <PCI to PCI bridge (vendor=8086 device=2545)> at device 3.0 on pci0
    pci2: <PCI bus> on pcib1
    pci2: <unknown card> (vendor=0x8086, dev=0x1461) at 28.0
    pcib2: <PCI to PCI bridge (vendor=8086 device=1460)> at device 29.0 on pci2
    IOAPIC #2 intpin 2 -> irq 18
    IOAPIC #2 intpin 1 -> irq 19
    pci5: <PCI bus> on pcib2
    ahd0: <Adaptec AIC7902 Ultra320 SCSI adapter> port 0x4000-0x40ff,0x3800-0x38ff mem 0xfe9e0000-0xfe9e1fff irq 18 at device 7.0 on pci5
    aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
    ahd1: <Adaptec AIC7902 Ultra320 SCSI adapter> port 0x3400-0x34ff,0x3000-0x30ff mem 0xfe9f0000-0xfe9f1fff irq 19 at device 7.1 on pci5
    aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
    pci2: <unknown card> (vendor=0x8086, dev=0x1461) at 30.0
    pcib3: <PCI to PCI bridge (vendor=8086 device=1460)> at device 31.0 on pci2
    IOAPIC #1 intpin 6 -> irq 20
    IOAPIC #1 intpin 7 -> irq 21
    pci3: <PCI bus> on pcib3
    em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 0x2040-0x207f mem 0xfe6c0000-0xfe6dffff irq 20 at device 7.0 on pci
    3
    em0: Speed:N/A Duplex:N/A
    em1: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 0x2000-0x203f mem 0xfe6e0000-0xfe6fffff irq 21 at device 7.1 on pci
    3
    em1: Speed:N/A Duplex:N/A
    pcib4: <PCI to PCI bridge (vendor=1014 device=01a7)> at device 9.0 on pci3
    IOAPIC #1 intpin 3 -> irq 22
    pci4: <PCI bus> on pcib4
    amr0: <LSILogic MegaRAID> mem 0xfe580000-0xfe5fffff,0xfbef0000-0xfbefffff irq 22 at device 0.0 on pci4
    amr0: <LSILogic Intel(R) RAID Controller SRCU42X> Firmware 413Y, BIOS H420, 128MB RAM
    pci0: <unknown card> (vendor=0x8086, dev=0x2546) at 3.1
    uhci0: <Intel 82801CA/CAM (ICH3) USB controller USB-A> port 0x5020-0x503f irq 2 at device 29.0 on pci0
    usb0: <Intel 82801CA/CAM (ICH3) USB controller USB-A> on uhci0
    usb0: USB revision 1.0
    uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    uhub0: 2 ports with 2 removable, self powered
    uhci1: <Intel 82801CA/CAM (ICH3) USB controller USB-B> port 0x5000-0x501f irq 16 at device 29.1 on pci0
    usb1: <Intel 82801CA/CAM (ICH3) USB controller USB-B> on uhci1
    usb1: USB revision 1.0
    uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    uhub1: 2 ports with 2 removable, self powered
    pcib5: <Intel 82801BA/BAM (ICH2) Hub to PCI bridge> at device 30.0 on pci0
    pci1: <PCI bus> on pcib5
    pci1: <ATI Mach64-GR graphics accelerator> at 12.0 irq 17
    isab0: <PCI to ISA bridge (vendor=8086 device=2480)> at device 31.0 on pci0
    isa0: <ISA bus> on isab0
    atapci0: <Intel ICH3 ATA100 controller> port 0x3a0-0x3af,0-0x3,0-0x7,0-0x3,0-0x7 irq 0 at device 31.1 on pci0
    ata0: at 0x1f0 irq 14 on atapci0
    ata1: at 0x170 irq 15 on atapci0
    pci0: <unknown card> (vendor=0x8086, dev=0x2483) at 31.3 irq 17
    orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff,0xc9000-0xc9fff on isa0
    pmtimer0 on isa0
    atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
    atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
    kbd0 at atkbd0
    psm0: <PS/2 Mouse> irq 12 on atkbdc0
    psm0: model Generic PS/2 mouse, device ID 0
    vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
    sc0: <System console> at flags 0x100 on isa0
    sc0: VGA <16 virtual consoles, flags=0x300>
    sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
    sio0: type 16550A
    sio1 at port 0x2f8-0x2ff irq 3 on isa0
    sio1: type 16550A
    APIC_IO: Testing 8254 interrupt delivery
    APIC_IO: routing 8254 via IOAPIC #0 intpin 2
    SMP: AP CPU #2 Launched!
    SMP: AP CPU #1 Launched!
    SMP: AP CPU #3 Launched!
    acd0: CDROM <SAMSUNG CD-ROM SN-124> at ata1-master PIO4
    Waiting 15 seconds for SCSI devices to settle
    amrd0: <LSILogic MegaRAID logical drive> on amr0
    amrd0: 140012MB (286744576 sectors) RAID 5 (optimal)
    pass0 at amr0 bus 0 target 6 lun 0
    pass0: <ESG-SHV SCA HSBP M22 0.06> Fixed Processor SCSI-2 device
    Mounting root from ufs:/dev/amrd0s1a

    Regards,

    Tony.

    -- 
    Tony Byrne
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
    

  • Next message: Gareth Hopkins: "MIT Kerberos and OpenSSH"

    Relevant Pages

    • tcp_output.c BUG in 2.6.12-rc6-mm1 report
      ... cpu family: 6 ... a000-bfff: PCI Bus #02 ... 000d0000-000d17ff: Adapter ROM ... # Performance-monitoring counters support ...
      (Linux-Kernel)
    • CPU1 never used despite HTT?
      ... I am running a Xeon with hyperthreading support. ... the second CPU is indeed launched. ... FreeBSD/SMP: Multiprocessor motherboard: 2 CPUs ... pci0: <PCI bus> on pcib0 ...
      (freebsd-stable)
    • Re: New PC build recommendations & help
      ... With the SLI mode Nvidia support, I'm guessing I could buy a couple ... What kind of Video Card setup would you suggest if I went with Nvidia ... in identical pairs in order to run in dual-channel mode instead of the ... AM2 version of the AMD CPU that you want. ...
      (alt.comp.hardware.pc-homebuilt)
    • Re: First physics, now an AI chip
      ... The next Unreal Tournament game will support it, for sure, and it's likely ... any future Unreal-based games will as well. ... motherboards (outside of the graphics slot and maybe a useless network card ... though Ageia claims that the PCI bus will be adequate. ...
      (comp.sys.ibm.pc.games.action)
    • RE: Can I convert a third party dll to an XIP dll on an CEPC?
      ... You can't XIP off of a CF card - the CF card is a block-based device ... (doesn't support a linear address space required by the CPU). ...
      (microsoft.public.windowsce.platbuilder)