Re: panic in deadlkres

2010/6/28 John Baldwin <jhb@xxxxxxxxxxx>:
On Friday 25 June 2010 4:52:22 pm pluknet wrote:
On 25 June 2010 13:50, Anton Yuzhaninov <citrin@xxxxxxxxx> wrote:
I've got panic on 9-current from Jun 25 2010

May be this is bug in deadlock resolver

panic: blockable sleep lock (sleep mutex) process lock @

db> show alllocks
Process 0 (kernel) thread 0xc4dcd270 (100047)
shared sx allproc (allproc) r = 0 (0xc0885ebc) locked @

db> show lock 0xc4dcd270
 class: spin mutex
 name: D
 flags: {SPIN, RECURSE}
 state: {OWNED}

(kgdb) bt
#0  doadump () at pcpu.h:248
#1  0xc05ae59f in boot (howto=260) at
#2  0xc05ae825 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:590
#3  0xc048ff45 in db_panic (addr=Could not find the frame base for
) at /usr/src/sys/ddb/db_command.c:478
#4  0xc0490533 in db_command (last_cmdp=0xc086ef1c, cmd_table=0x0,
dopager=1) at /usr/src/sys/ddb/db_command.c:445
#5  0xc0490662 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
#6  0xc04923ef in db_trap (type=3, code=0) at
#7  0xc05dade6 in kdb_trap (type=3, code=0, tf=0xc4b31bd0) at
#8  0xc078696b in trap (frame=0xc4b31bd0) at
#9  0xc076ca0b in calltrap () at /usr/src/sys/i386/i386/exception.s:165
#10 0xc05daf30 in kdb_enter (why=0xc07ea02d "panic", msg=0xc07ea02d
"panic") at cpufunc.h:71
#11 0xc05ae806 in panic (fmt=0xc07efd94 "blockable sleep lock (%s) %s @
%s:%d") at /usr/src/sys/kern/kern_shutdown.c:573
#12 0xc05ee30b in witness_checkorder (lock=0xc5148088, flags=9,
file=0xc07e3b20 "/usr/src/sys/kern/kern_clock.c", line=203, interlock=0x0)
   at /usr/src/sys/kern/subr_witness.c:1067
#13 0xc05a093c in _mtx_lock_flags (m=0xc5148088, opts=0, file=0xc07e3b20
"/usr/src/sys/kern/kern_clock.c", line=203)
   at /usr/src/sys/kern/kern_mutex.c:200
#14 0xc05706a9 in deadlkres () at /usr/src/sys/kern/kern_clock.c:203
#15 0xc0588721 in fork_exit (callout=0xc05705ea <deadlkres>, arg=0x0,
frame=0xc4b31d38) at /usr/src/sys/kern/kern_fork.c:843
#16 0xc076ca80 in fork_trampoline () at


[throw in ideas (just ignore them if they're dumb, thinking badly atm).]

AFAIK, that indicates that some thread already has
a spin mutex and then it tries to acquire a sleep mutex.

Looks like kern/kern_clock.c v1.213 (SVN rev 206482)
has a regression in handling ticks wrap-up
w.r.t. it doesn't release a thread mutex, does it?

This looks like a correct analysis to me.

From subr_witness.c:
1062:                 * Since spin locks include a critical section, this
1063:                 * implicitly enforces a lock order of all sleep
locks before
1064:                 * all spin locks.
1065:                 */
1066:                if (td->td_critnest != 0 && !kdb_active)
1067:                        panic("blockable sleep lock (%s) %s @ %s:%d",
1068:                            class->lc_name, lock->lo_name, file, line);

From kern_clock.c, v1.213 (in several places, while holding a thread lock):
+                                     /* Handle ticks wrap-up. */
+                                     if (ticks < td->td_blktick)
+                                             continue;

Should not it be like the next:
+                                     /* Handle ticks wrap-up. */
+                                     if (ticks < td->td_blktick) {
+                                             thread_unlock(td);
+                                             continue;
+                                     }

The precondition idea to reproduce it is to lock a subject thread
in some deadlkres callout, handle re-wrap condition, then try
to lock a process to witch the thread belongs in (n+m)'th deadlkres
callout, or in different context.

Thanks, that may be fixed in r209577.


