Re: 6.2: reproducible hang on amd64, traced to 24h of commits



fwiw, i have not traced it down to a commit (got fed up with hangs), but conclusively singled out smartmontools as the trigger.
after adding 2 more disks, machine wouldn't even boot up past starting smartmontools, locking up hard with the same symptoms.
with smartmontools disabled, it booted up and has been up for > 2 months now.

Deomid Ryabkov wrote:
ok, now that the machine has been up for 10 days, i am reasonably sure i've close enough to this one.

back in january i cvsupped to -STABLE and the box (dual head opteron box) started hanging.
and i mean it dies completely.
i have all debug options and a working serial console, but still it just dies and both serial and system console are unresponsive.
no panic message on either, nothing. pretty sad.

the kernel config is vanilla SMP GENERIC, with all debug options i could think of enabled (after it started hanging).

so the first thing i did after rebooting the box a couple of times is fall back to kernel.old (6.1-STABLE circa august '06).
no hangs. i then started incrementally updating, gradually getting closer to jan 22.
long story short, i seem to have isolated the problem to commits made between
date=2006.12.28.00.00.00 and date=2006.12.29.00.00.00.
last hang i had was when running the 12/29 kernel, now it's 12/28 and the box has been up for 2 weeks already.
based on previois experience i'm pretty certain that this is it. with bad kernel the box would never stay up more than a few days, never more than 5.
between 12/28 and 12/29 i see some changes to /sys/amd64/ and /sys/pci/, which might've be the cause.
i will probably start looking into individual changes, but if anyone more experienced than me could take a look, it'd be appreciated.
i am willing to try patches.
i confirmed that recent (as of 3 weeks or so) -STABLE still has this problem.

thanks in advance.

====
files under /sys that were changed between 12/28 and 12/29:

Edit src/sys/amd64/amd64/mptable_pci.c
Edit src/sys/amd64/pci/pci_bus.c
Edit src/sys/contrib/dev/ath/public/wackelf.c
Edit src/sys/dev/acpica/acpi_pci.c
Edit src/sys/dev/acpica/acpi_pcib_acpi.c
Edit src/sys/dev/acpica/acpi_pcib_pci.c
Checkout src/sys/dev/ath/if_ath.c
Edit src/sys/dev/cardbus/cardbus.c
Edit src/sys/dev/drm/drm_agpsupport.c
Edit src/sys/dev/pci/pci.c
Edit src/sys/dev/pci/pci_if.m
Edit src/sys/dev/pci/pci_pci.c
Edit src/sys/dev/pci/pci_private.h
Edit src/sys/dev/pci/pcib_private.h
Edit src/sys/dev/pci/pcivar.h
Edit src/sys/i386/i386/mptable_pci.c
Edit src/sys/i386/pci/pci_bus.c
Edit src/sys/kern/subr_bus.c
Checkout src/sys/netgraph/ng_deflate.h
Edit src/sys/pci/agp.c
Edit src/sys/pci/agpreg.h
Edit src/sys/powerpc/ofw/ofw_pcib_pci.c
Edit src/sys/sparc64/pci/apb.c
Edit src/sys/sparc64/pci/ofw_pcib.c
Edit src/sys/sparc64/pci/ofw_pcibus.c
Edit src/sys/sys/param.h


====
kernel configuration used:

include GENERIC

options SMP

options KDB
options DDB

makeoptions DEBUG=-g
options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options DEBUG_LOCKS
options DEBUG_VFS_LOCKS
options DIAGNOSTIC
====



--
Deomid Ryabkov aka Rojer
myself@xxxxxxxxxxx
rojer@xxxxxxxxxxxx
ICQ: 8025844

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature



Relevant Pages