Re: Repeatable kernel panic on -CURRENT using ZFS over SATA



Dag-Erling Smørgrav wrote:
Bill Hacker <askbill@xxxxxxxxxxxxx> writes:
Short answer - you are overstressing your very marginal hardware.

You're completely off the mark. Steven is experiencing a well-known bug
in the ata driver.

DES


In case I can be helpful, I would still like to debug this problem.


Please tell me if my constant whining at the list is constructive and
helpful in tracing this bug down :)
If it's not, I'd rather let you guys code than answer my emails, but if
I can be of any help I am willing.

Here's a dump that I captured using -CURRENT as of two nights ago:

Dump header from device /dev/da0s1b
Architecture: i386
Architecture Version: 2
Dump Length: 113577984B (108 MB)
Blocksize: 512
Dumptime: Fri Oct 5 00:37:08 2007
Hostname: scotch.CSUA.Berkeley.EDU
Magic: FreeBSD Kernel Dump
Version String: FreeBSD 7.0-CURRENT #1: Thu Oct 4 06:23:40 PDT 2007
root@xxxxxxxxxxxxxxxxxxxxxxxx:/usr/obj/usr/src/sys/GENERIC
Panic String: from debugger
Dump Parity: 3604782152
Bounds: 2
Dump Status: good



Unread portion of the kernel message buffer:
ad12: FAILURE - device detached
subdisk12: detached
ad12: detached


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x2c
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc07422d6
stack pointer = 0x28:0xd9e98c58
frame pointer = 0x28:0xd9e98c78
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 3 (g_up)
panic: from debugger
cpuid = 0
Uptime: 16m4s
Physical memory: 499 MB
Dumping 108 MB: 93 77 61 45 29 13

#0 doadump () at pcpu.h:195
195 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0 doadump () at pcpu.h:195
#1 0xc074d7ae in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2 0xc074da6b in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3 0xc048cab7 in db_panic (addr=Could not find the frame base for
"db_panic".
) at /usr/src/sys/ddb/db_command.c:433
#4 0xc048d4a5 in db_command_loop () at /usr/src/sys/ddb/db_command.c:401
#5 0xc048ec15 in db_trap (type=12, code=0) at
/usr/src/sys/ddb/db_main.c:222
#6 0xc07746f6 in kdb_trap (type=12, code=0, tf=0xd9e98c18) at
/usr/src/sys/kern/subr_kdb.c:502
#7 0xc0a01aaf in trap_fatal (frame=0xd9e98c18, eva=44) at
/usr/src/sys/i386/i386/trap.c:863
#8 0xc0a01ce3 in trap_pfault (frame=0xd9e98c18, usermode=0, eva=44) at
/usr/src/sys/i386/i386/trap.c:785
#9 0xc0a02695 in trap (frame=0xd9e98c18) at
/usr/src/sys/i386/i386/trap.c:463
#10 0xc09e81fb in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#11 0xc07422d6 in _mtx_lock_flags (m=0x1c, opts=0,
file=0xc31edd67
"/usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c",
line=472)
at /usr/src/sys/kern/kern_mutex.c:177
#12 0xc31e2fb4 in ?? ()
#13 0x0000001c in ?? ()
#14 0x00000000 in ?? ()
#15 0xc31edd67 in ?? ()
#16 0x000001d8 in ?? ()
#17 0xc788c5ac in ?? ()
#18 0xc31e2f70 in ?? ()
#19 0xc2d9c840 in ?? ()
#20 0xd9e98cbc in ?? ()
#21 0xc07b0d49 in biodone (bp=0x8) at /usr/src/sys/kern/vfs_bio.c:3009
Previous frame identical to this frame (corrupt stack?)
(kgdb) list *0xc07422d6
0xc07422d6 is in _mtx_lock_flags (/usr/src/sys/kern/kern_mutex.c:178).
173 void
174 _mtx_lock_flags(struct mtx *m, int opts, const char *file, int line)
175 {
176
177 MPASS(curthread != NULL);
178 KASSERT(m->mtx_lock != MTX_DESTROYED,
179 ("mtx_lock() of destroyed mutex @ %s:%d", file, line));
180 KASSERT(LOCK_CLASS(&m->lock_object) ==
&lock_class_mtx_sleep,
181 ("mtx_lock() of spin mutex %s @ %s:%d",
m->lock_object.lo_name,
182 file, line));
(kgdb) list *0xc31e2fb4
No source file for address 0xc31e2fb4.
(kgdb) list *0xc07b0d49
0xc07b0d49 is in biodone (/usr/src/sys/kern/vfs_bio.c:3010).
3005 if (done == NULL)
3006 wakeup(bp);
3007 mtx_unlock(&bdonelock);
3008 if (done != NULL)
3009 done(bp);
3010 }
3011
3012 /*
3013 * Wait for a BIO to finish.
3014 *



Interestingly enough, I can't seem to get a useful backtrace... all of
those ??? frames!

Perhaps someone who knows more about kernel debugging than I can step me
through from here. I read the kernel debugging section of the FreeBSD
handbook, and it was not useful as to what to do if the stack is
seemingly corrupt :)

I also have a dump from a time when I hotplugged a SATA drive and it
instantly paniced on me - usually this has been working, but that time
it just gave up. Not sure how interesting this dump is though, haven't
been able to reproduce it (granted I haven't tried very hard).

-Steven
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • FreeBSD 6.1, crashes and a lack of vmcores
    ... These servers run under extremely high load through a majority of the day and run a mix of postfix, MySQL and custom filter software using MFS partitions. ... The biggest frustration in this is that of the few dozen crashes we've had I've only been able to get one successful dump. ... kernel: kernel dumps on /dev/ad0s1b ...
    (freebsd-hackers)
  • ddb(4) spoils kernel stack in CURRENT?
    ... My kernel config is: ... kgdb against the resulting kernel dump fails to print complete backtrace: ... Previous frame inner to this frame (corrupt stack?) ...
    (freebsd-current)
  • Kernel panic with IGMPv3
    ... My last running kernel is from 03/05/2009, I can't get a working kernel ... because it panics soon after the boot process (or even while ... Here is the kgdb output with a kernel compiled today from latest ... Previous frame inner to this frame ...
    (freebsd-current)
  • Helping interpreting crash
    ... My FreeBSD server crashed out the other night. ... So could anyone possibly have a look at below (or tell me of somewhere I can go to get the right info..bearing in mind I don't know all that much about the kernel) and let me know what's up? ... kgdb kernel.debug /var/crash/vmcore.1 ... #12 0xc05df0cf in syscall (frame= ...
    (freebsd-questions)
  • 6.2-STABLE panic in network multicast-address related cleanup code
    ... The following kernel panic occurred twice yesterday evening under ... # kgdb /boot/kernel/kernel.debug vmcore.40 ... #19 0xc06e6577 in syscall (frame= ... Seems like some multi-cast related cleanups are broken. ...
    (freebsd-stable)