Re: FBSD 5.4-STABLE/3Ware Escalade 7506-4LP on dual Opteron issue

From: Tony Shadwick (tshadwick_at_goinet.com)
Date: 06/09/05

  • Next message: John Brooks: "RE: DNS problem?"
    Date: Thu, 9 Jun 2005 09:32:07 -0500 (CDT)
    To: Steve Richardson <prefect@sidehack.sat.gweep.net>
    
    

    I'm not claiming this will fix your issue, but are you running the
    absolute latest kernel sources? There is the possibility this issue has
    been resolve in a newer kernel.

    cvsup your sources and try doing a build. See what happens.

    On Thu, 9 Jun 2005, Steve Richardson wrote:

    >
    > Hi,
    >
    > We're building out brand new dual Opteron box to run our public access unix
    > site. We're running FreeBSD 5.4 and a 3Ware Escalade 7506-4LP. We are
    > having difficulties with the system, and any help you can offer would be
    > greatly appreciated.
    >
    > For the most part, everything behaves fine. We've got the system built and
    > installed. Unfortunately, we're having a periodic, catastrophic failure
    > involving the 3Ware card.
    >
    > Periodically, the system will partly lock up with the following errors:
    >
    > twe0: unexpected status bit(s) 100000<PCIABRT>
    > twe0: PCI abort, clearing.
    >
    > I say partly lock up because the kernel does not panic, nor do the console
    > keyboard or network interfaces become non-responsive (i.e. you can type
    > stuff at the login prompt, and ping the server). However, the disk
    > subsystem does appear to cease functioning once this has occurred.
    >
    > Frankly at this point we are baffled, because the system is stable enough to
    > run for days on end under light load, and will even occasionally handle
    > periods of medium disk load (e.g. many hours of rsyncing from our live
    > server, build world, etc).
    >
    > We have been using the bonnie++ hard disk benchmarking suite as a means for
    > recreating the problem, as follows:
    >
    >> mkdir testdir
    >> bonnie++ -d ./dbench -s 2g -n 100:500000:1000 -x 100
    >
    > I've included system information below, including dmesg output.
    >
    >
    > regards,
    > Steve Richardson
    > System Administrator
    > GweepNet Cooperative Network
    >
    >
    >
    > System Description:
    > Gigabyte GA-7A8DW motherboard
    > (2) AMD Opteron 246 2GHz CPUs
    > 2GB Samsung PC3200 ECC RAM
    > 3Ware Escalade 7506-4LP parallel ATA RAID, installed in 64 bit PCI slot
    >
    > OS:
    > FreeBSD 5.4-STABLE FreeBSD 5.4-STABLE amd64
    >
    >
    > dmesg output:
    >
    > Copyright (c) 1992-2005 The FreeBSD Project.
    > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    > The Regents of the University of California. All rights reserved.
    > FreeBSD 5.4-STABLE #2: Tue Jun 7 00:10:29 EDT 2005
    > root@newsidey.gweep.net:/usr/obj/usr/src/sys/SIDEHACK
    > Timecounter "i8254" frequency 1193182 Hz quality 0
    > CPU: AMD Opteron(tm) Processor 246 (1993.79-MHz K8-class CPU)
    > Origin = "AuthenticAMD" Id = 0xf5a Stepping = 10
    > Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
    > AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow+,3DNow>
    > real memory = 2146893824 (2047 MB)
    > avail memory = 2061205504 (1965 MB)
    > ACPI APIC Table: <PTLTD APIC >
    > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
    > cpu0 (BSP): APIC ID: 0
    > cpu1 (AP): APIC ID: 1
    > MADT: Forcing active-low polarity and level trigger for SCI
    > ioapic0 <Version 1.1> irqs 0-23 on motherboard
    > ioapic1 <Version 1.1> irqs 24-27 on motherboard
    > ioapic2 <Version 1.1> irqs 28-31 on motherboard
    > acpi0: <PTLTD XSDT> on motherboard
    > acpi0: Power Button (fixed)
    > acpi0: Sleep Button (fixed)
    > acpi_bus_number: can't get _ADR
    > acpi_bus_number: can't get _ADR
    > acpi_bus_number: can't get _ADR
    > unknown: I/O range not supported
    > unknown: I/O range not supported
    > ACPI-1304: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT
    > ACPI-0239: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT
    > can't fetch resources for \\_SB_.PCI0.LPC_.LPT_ - AE_AML_BUFFER_LIMIT
    > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
    > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0
    > cpu0: <ACPI CPU> on acpi0
    > cpu1: <ACPI CPU> on acpi0
    > acpi_button0: <Power Button> on acpi0
    > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
    > pci0: <ACPI PCI bus> on pcib0
    > pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
    > pci1: <ACPI PCI bus> on pcib1
    > pci1: <display, VGA> at device 0.0 (no driver attached)
    > pcib2: <ACPI PCI-PCI bridge> at device 6.0 on pci0
    > pci2: <ACPI PCI bus> on pcib2
    > ohci0: <OHCI (generic) USB controller> mem 0xd0110000-0xd0110fff irq 19 at device 0.0 on pci2
    > usb0: OHCI version 1.0, legacy support
    > usb0: SMM does not respond, resetting
    > usb0: <OHCI (generic) USB controller> on ohci0
    > usb0: USB revision 1.0
    > uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    > uhub0: 3 ports with 3 removable, self powered
    > ohci1: <OHCI (generic) USB controller> mem 0xd0111000-0xd0111fff irq 19 at device 0.1 on pci2
    > usb1: OHCI version 1.0, legacy support
    > usb1: SMM does not respond, resetting
    > usb1: <OHCI (generic) USB controller> on ohci1
    > usb1: USB revision 1.0
    > uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    > uhub1: 3 ports with 3 removable, self powered
    > ahc0: <Adaptec 2902/04/10/15/20C/30C SCSI adapter> port 0x3000-0x30ff mem 0xd0112000-0xd0112fff irq 17 at device 4.0 on pci2
    > aic7850: Single Channel A, SCSI Id=7, 3/253 SCBs
    > bge0: <Broadcom BCM5705 Gigabit Ethernet, ASIC rev. 0x3003> mem 0xd0100000-0xd010ffff irq 19 at device 5.0 on pci2
    > miibus0: <MII bus> on bge0
    > brgphy0: <BCM5705 10/100/1000baseTX PHY> on miibus0
    > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
    > bge0: Ethernet address: 00:0f:ea:7e:b1:81
    > atapci0: <SiI 3114 SATA150 controller> port 0x3400-0x340f,0x3410-0x3413,0x3418-0x341f,0x3414-0x3417,0x3420-0x3427 mem 0xd0113000-0xd01133ff irq 18 at device 6.0 on pci2
    > ata2: channel #0 on atapci0
    > ata3: channel #1 on atapci0
    > ata4: channel #2 on atapci0
    > ata5: channel #3 on atapci0
    > isab0: <PCI-ISA bridge> at device 7.0 on pci0
    > isa0: <ISA bus> on isab0
    > atapci1: <AMD 8111 UDMA133 controller> port 0x1000-0x100f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0
    > ata0: channel #0 on atapci1
    > ata1: channel #1 on atapci1
    > pci0: <bridge> at device 7.3 (no driver attached)
    > pcib3: <ACPI Host-PCI bridge> on acpi0
    > pci8: <ACPI PCI bus> on pcib3
    > pcib4: <ACPI PCI-PCI bridge> at device 3.0 on pci8
    > pci9: <ACPI PCI bus> on pcib4
    > pci8: <base peripheral, interrupt controller> at device 3.1 (no driver attached)
    > pcib5: <ACPI PCI-PCI bridge> at device 4.0 on pci8
    > pci14: <ACPI PCI bus> on pcib5
    > twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0x4000-0x400f mem 0xf0800000-0xf0ffffff irq 30 at device 2.0 on pci14
    > twe0: 4 ports, Firmware FE7X 1.05.00.068, BIOS BE7X 1.08.00.048
    > pci8: <base peripheral, interrupt controller> at device 4.1 (no driver attached)
    > atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
    > atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
    > kbd0 at atkbd0
    > fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
    > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
    > sio0: type 16550A
    > sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
    > sio1: type 16550A
    > ppc0: cannot reserve I/O port range
    > ppc0: cannot reserve I/O port range
    > orm0: <ISA Option ROMs> at iomem 0xd0000-0xd0fff,0xc0000-0xcffff on isa0
    > ppc0: cannot reserve I/O port range
    > sc0: <System console> at flags 0x100 on isa0
    > sc0: VGA <16 virtual consoles, flags=0x300>
    > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
    > Timecounters tick every 1.000 msec
    > ahc0: Someone reset channel A
    > ad0: 152627MB <SAMSUNG SP1614N/TM100-30> [310101/16/63] at ata0-master UDMA100
    > ad2: 286188MB <Maxtor 6B300R0/BAH41B70> [581463/16/63] at ata1-master UDMA133
    > Waiting 15 seconds for SCSI devices to settle
    > twed0: <Unit 0, RAID5, Normal> on twe0
    > twed0: 305253MB (625159424 sectors)
    > sa0 at ahc0 bus 0 target 3 lun 0
    > sa0: <EXABYTE EXB-89008E00012F V39e> Removable Sequential Access SCSI-2 device
    > sa0: 10.000MB/s transfers (10.000MHz, offset 15)
    > SMP: AP CPU #1 Launched!
    > Mounting root from ufs:/dev/twed0s1a
    > WARNING: / was not properly dismounted
    > WARNING: /home/crib was not properly dismounted
    > WARNING: /home/domus was not properly dismounted
    > WARNING: /tmp was not properly dismounted
    > WARNING: /u was not properly dismounted
    > WARNING: /u/backup/nearline was not properly dismounted
    > WARNING: /u/backup/online was not properly dismounted
    > WARNING: /u/news was not properly dismounted
    > WARNING: /u/news/nntpcached was not properly dismounted
    > WARNING: /usr was not properly dismounted
    > WARNING: /var was not properly dismounted
    > WARNING: /var/tmp was not properly dismounted
    > bge0: firmware handshake timed out
    > bge0: RX CPU self-diagnostics failed!
    > bge0: watchdog timeout -- resetting
    > _______________________________________________
    > freebsd-questions@freebsd.org mailing list
    > http://lists.freebsd.org/mailman/listinfo/freebsd-questions
    > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
    >
    _______________________________________________
    freebsd-questions@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-questions
    To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"


  • Next message: John Brooks: "RE: DNS problem?"