reliable disk FAILURE

From: Elliot Finley (lists_at_efinley.com)
Date: 03/15/04

  • Next message: Tim Kellers: "p IV HTT hard drive time out"
    To: <freebsd-current@freebsd.org>
    Date: Mon, 15 Mar 2004 11:31:03 -0700
    
    

    when doing a disk-to-disk backup using dump/restore, I reliably get a disk
    failure. I don't think it's the disk because it happens on six different
    machines. All six machines are using SATA drives. They all have an ASUS
    P4P800 MB.

    None of the six machines had any problems until after the last security
    patch to 5.2.1. After the patch, they all fail. If I remember correctly,
    the last security patch only touched some TCP files, so the disk failures
    don't make any sense to me.

    commands causing the failure, console output and dmesg are below. This is
    on a test machine that I can take down or modify at any time, so if there is
    anything further that I can do to help debug this - please let me know.

    sequence of commands issued to cause failure
    ----------------------------------------------
    Executing command: /bin/dd if=/dev/zero of=/dev/ad14 bs=1k count=1
    Executing command: /sbin/fdisk -BI ad14
    Executing command: /sbin/bsdlabel -w -B ad14s1 auto
    Executing command: /sbin/bsdlabel ad14s1 > /tmp/backup.disk.label
    Executing command: /bin/echo 'a: 2097152 0 4.2BSD' >> /tmp/backup.disk.label
    Executing command: /bin/echo 'b: 4194304 * swap' >> /tmp/backup.disk.label
    Executing command: /bin/echo 'd: 125829120 * 4.2BSD' >>
    /tmp/backup.disk.label
    Executing command: /bin/echo 'e: * * 4.2BSD' >> /tmp/backup.disk.label
    Executing command: /sbin/bsdlabel -R -B ad14s1 /tmp/backup.disk.label
    Executing command: /sbin/newfs -U /dev/ad14s1a
    Executing command: /sbin/newfs -U /dev/ad14s1d
    Executing command: /sbin/newfs -U /dev/ad14s1e
    Executing command: /sbin/mount -rw /dev/ad14s1a /mnt
    Executing command: /sbin/dump -0Lf - / | (cd /mnt; /sbin/restore -rf -)
    Executing command: /sbin/umount /mnt
    Executing command: /sbin/mount -rw /dev/ad14s1d /mnt
    Executing command: /sbin/dump -0Lf - /usr | (cd /mnt; /sbin/restore -rf -)
      DUMP: Date of this level 0 dump: Mon Mar 15 10:32:02 2004
      DUMP: Date of last level 0 dump: the epoch
      DUMP: Dumping snapshot of /dev/ad12s1d (/usr) to standard output
      DUMP: mapping (Pass I) [regular files]
      DUMP: mapping (Pass II) [directories]
      DUMP: estimated 1621128 tape blocks.
      DUMP: dumping (Pass III) [directories]
      DUMP: dumping (Pass IV) [regular files]
    warning: ./.snap: File exists
    (dump/restore dies here - this time (doesn't die in same place every time) -
    causing the following output on the console)

    console output
    ---------------
    ad12: TIMEOUT - READ_DMA retrying (2 retries left) LBA=28166139
    ad12: timeout sending command=c8
    ad12: error issuing DMA command
    GEOM: create disk ad12 dp=0xc6ded160
    ad12: 76319MB <ST380013AS> [155061/16/63] at ata6-master UDMA100
    ad12: FAILURE - SETFEATURES SET TRANSFER MODE timed out

    dmesg
    ------
    Copyright (c) 1992-2004 The FreeBSD Project.
    Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
            The Regents of the University of California. All rights reserved.
    FreeBSD 5.2.1-RELEASE-p1 #5: Fri Mar 5 17:54:52 MST 2004
        root@oregon.etv.net:/usr/obj/usr/src/sys/GENERIC
    Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a35000.
    Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0a3521c.
    ACPI APIC Table: <A M I OEMAPIC >
    Timecounter "i8254" frequency 1193182 Hz quality 0
    CPU: Intel(R) Pentium(R) 4 CPU 2.60GHz (2598.76-MHz 686-class CPU)
      Origin = "GenuineIntel" Id = 0xf29 Stepping = 9

    Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA
    ,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
      Hyperthreading: 2 logical CPUs
    real memory = 1072889856 (1023 MB)
    avail memory = 1032749056 (984 MB)
    FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
     cpu0 (BSP): APIC ID: 0
     cpu1 (AP): APIC ID: 1
    ioapic0 <Version 2.0> irqs 0-23 on motherboard
    Pentium Pro MTRR support enabled
    npx0: [FAST]
    npx0: <math processor> on motherboard
    npx0: INT 16 interface
    acpi0: <A M I OEMXSDT > on motherboard
    pcibios: BIOS version 2.10
    Using $PIR table, 14 entries at 0xc00f5410
    acpi0: Power Button (fixed)
    Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
    acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
    acpi_cpu0: <CPU> on acpi0
    acpi_cpu1: <CPU> on acpi0
    pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
    pci0: <ACPI PCI bus> on pcib0
    agp0: <Intel 82865 host to AGP bridge> mem 0xf8000000-0xfbffffff at device
    0.0 on pci0
    pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
    pcib1: could not get PCI interrupt routing table for \\_SB_.PCI0.P0P1 -
    AE_NOT_FOUND
    pci1: <ACPI PCI bus> on pcib1
    pci1: <display, VGA> at device 0.0 (no driver attached)
    uhci0: <Intel 82801EB (ICH5) USB controller USB-A> port 0xef00-0xef1f irq 16
    at device 29.0 on pci0
    usb0: <Intel 82801EB (ICH5) USB controller USB-A> on uhci0
    usb0: USB revision 1.0
    uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    uhub0: 2 ports with 2 removable, self powered
    uhci1: <Intel 82801EB (ICH5) USB controller USB-B> port 0xef20-0xef3f irq 19
    at device 29.1 on pci0
    usb1: <Intel 82801EB (ICH5) USB controller USB-B> on uhci1
    usb1: USB revision 1.0
    uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    uhub1: 2 ports with 2 removable, self powered
    uhci2: <Intel 82801EB (ICH5) USB controller USB-C> port 0xef40-0xef5f irq 18
    at device 29.2 on pci0
    usb2: <Intel 82801EB (ICH5) USB controller USB-C> on uhci2
    usb2: USB revision 1.0
    uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    uhub2: 2 ports with 2 removable, self powered
    uhci3: <Intel 82801EB (ICH5) USB controller USB-D> port 0xef80-0xef9f irq 16
    at device 29.3 on pci0
    usb3: <Intel 82801EB (ICH5) USB controller USB-D> on uhci3
    usb3: USB revision 1.0
    uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
    uhub3: 2 ports with 2 removable, self powered
    pci0: <serial bus, USB> at device 29.7 (no driver attached)
    pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0
    pci2: <ACPI PCI bus> on pcib2
    skc0: <3Com 3C940 Gigabit Ethernet> port 0xd800-0xd8ff mem
    0xfeafc000-0xfeafffff irq 22 at device 5.0 on pci2
    skc0: 3Com Gigabit LOM (3C940)
    sk0: <Marvell Semiconductor, Inc. Yukon> on skc0
    sk0: Ethernet address: 00:0c:6e:54:4b:25
    miibus0: <MII bus> on sk0
    e1000phy0: <Marvell 88E1000 Gigabit PHY> on miibus0
    e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX,
    auto
    atapci0: <Promise PDC20319 SATA150 controller> port
    0xdc00-0xdc7f,0xdfa0-0xdfaf,0xdf00-0xdf3f mem
    0xfeac0000-0xfeadffff,0xfeafb000-0xfeafbfff irq 21 at device 9.0 on pci2
    atapci0: [MPSAFE]
    ata2: at 0xfeafb000 on atapci0
    ata2: [MPSAFE]
    ata3: at 0xfeafb000 on atapci0
    ata3: [MPSAFE]
    ata4: at 0xfeafb000 on atapci0
    ata4: [MPSAFE]
    ata5: at 0xfeafb000 on atapci0
    ata5: [MPSAFE]
    fxp0: <Intel 82550 Pro/100 Ethernet> port 0xde80-0xdebf mem
    0xfeaa0000-0xfeabffff,0xfeafa000-0xfeafafff irq 23 at device 11.0 on pci2
    fxp0: Ethernet address 00:02:b3:d1:f7:ad
    miibus1: <MII bus> on fxp0
    inphy0: <i82555 10/100 media interface> on miibus1
    inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
    isab0: <PCI-ISA bridge> at device 31.0 on pci0
    isa0: <ISA bus> on isab0
    atapci1: <Intel ICH5 UDMA100 controller> port
    0xfc00-0xfc0f,0-0x3,0-0x7,0-0x3,0-0x7 at device 31.1 on pci0
    ata0: at 0x1f0 irq 14 on atapci1
    ata0: [MPSAFE]
    ata1: at 0x170 irq 15 on atapci1
    ata1: [MPSAFE]
    atapci2: <Intel ICH5 SATA150 controller> port
    0xef60-0xef6f,0xefa8-0xefab,0xefa0-0xefa7,0xefac-0xefaf,0xefe0-0xefe7 irq 18
    at device 31.2 on pci0
    atapci2: [MPSAFE]
    ata6: at 0xefe0 on atapci2
    ata6: [MPSAFE]
    ata7: at 0xefa0 on atapci2
    ata7: [MPSAFE]
    pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
    pci0: <multimedia, audio> at device 31.5 (no driver attached)
    acpi_button0: <Power Button> on acpi0
    atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
    atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
    kbd0 at atkbd0
    psm0: <PS/2 Mouse> irq 12 on atkbdc0
    psm0: model IntelliMouse, device ID 3
    sio0 port 0x3f8-0x3ff irq 4 on acpi0
    sio0: type 16550A
    sio1 port 0x2e8-0x2ef irq 3 on acpi0
    sio1: type 16550A
    ppc0 port 0x378-0x37f irq 7 on acpi0
    ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
    ppbus0: <Parallel port bus> on ppc0
    plip0: <PLIP network interface> on ppbus0
    lpt0: <Printer> on ppbus0
    lpt0: Interrupt-driven port
    ppi0: <Parallel I/O> on ppbus0
    orm0: <Option ROMs> at iomem 0xc8000-0xc97ff,0xc0000-0xc7fff on isa0
    pmtimer0 on isa0
    fdc0: ready for input in output
    fdc0: cmd 3 failed at out byte 1 of 3
    sc0: <System console> at flags 0x100 on isa0
    sc0: VGA <16 virtual consoles, flags=0x300>
    vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
    Timecounters tick every 10.000 msec
    acd0: CDROM <FX54++M> at ata0-master PIO4
    GEOM: create disk ad12 dp=0xc6b91060
    ad12: 76319MB <ST380013AS> [155061/16/63] at ata6-master UDMA100
    GEOM: create disk ad14 dp=0xc69d1d60
    ad14: 76319MB <ST380013AS> [155061/16/63] at ata7-master UDMA100
    SMP: AP CPU #1 Launched!
    Mounting root from ufs:/dev/ad12s1a
    WARNING: / was not properly dismounted
    WARNING: /usr was not properly dismounted
    WARNING: /var was not properly dismounted

    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Tim Kellers: "p IV HTT hard drive time out"