Re: reproducible "panic: share->excl"



On Mon, Jul 21, 2008 at 05:03:14PM -0400, Andrew Gallatin wrote:
I can panic today's -current reliably (or hang it with
WITNESS/INVARIENTS disabled). When it crashes, I see
the appended panic messages.

It seems to be 100% reproducible on my box (AMD64 x2,
512MB ram, UFS2). If anybody savvy in this area would
like to reproduce it, I've left the program at ~gallatin/ahunt.c
on freefall. Compile it, and run it as:
./a.out -mmbfileinit -madvise=/var/tmp/zot -random -size=95536
-touch=4096 -rewrite=2


Cheers,

Drew

PS: Here is a serial console log from the panic:
...

login: shared lock of (lockmgr) ufs @ kern/vfs_subr.c:2044
while exclusively locked from kern/vfs_vnops.c:593
panic: share->excl
cpuid = 1
KDB: enter: panic
[thread pid 1702 tid 100149 ]
Stopped at kdb_enter+0x3d: movq $0,0x639958(%rip)
db> tr
Tracing pid 1702 tid 100149 td 0xffffff000d08f000
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x176
witness_checkorder() at witness_checkorder+0x137
__lockmgr_args() at __lockmgr_args+0xc74
ffs_lock() at ffs_lock+0x8c
VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
_vn_lock() at _vn_lock+0x47
vget() at vget+0x7b
vnode_pager_lock() at vnode_pager_lock+0x146
vm_fault() at vm_fault+0x1e2
trap_pfault() at trap_pfault+0x128
trap() at trap+0x395
calltrap() at calltrap+0x8
--- trap 0xc, rip = 0xffffffff8079f2bd, rsp = 0xfffffffe58c2f7b0, rbp =
0xfffffffe58c2f830 ---
copyin() at copyin+0x3d
ffs_write() at ffs_write+0x2f8
VOP_WRITE_APV() at VOP_WRITE_APV+0x10b
vn_write() at vn_write+0x23f
dofilewrite() at dofilewrite+0x85
--More--

kern_writev() at kern_writev+0x60
write() at write+0x54
syscall() at syscall+0x1dd
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (4, FreeBSD ELF64, write), rip = 0x8007296ec, rsp =
0x7fffffffe158, rbp = 0x7fffffffe210 ---
db> show locks
exclusive sleep mutex vnode interlock r = 0 (0xffffff000d0dc0c0) locked
@ vm/vnode_pager.c:1199
exclusive sx user map r = 0 (0xffffff000d054360) locked @ vm/vm_map.c:3115
exclusive lockmgr bufwait r = 0 (0xfffffffe5047f278) locked @
kern/vfs_bio.c:1783
exclusive lockmgr ufs r = 0 (0xffffff000d0dc098) locked @
kern/vfs_vnops.c:593
db>

Essentially, you tried to do the write of the part of the region mmaped
from the file, to the file. The VOP_WRITE() is called with exclusively
locked vnode, while fault handler tried to lock the vnode in shared mode
to page in.

The following change fixed it for me.
Attilio, would it make sense to consider LK_CANRECURSE | LK_SHARED as
a request for the exlusive lock when the current thread already hold the
exclusive lock instead ? I think this would be a proper solution.

diff --git a/sys/vm/vnode_pager.c b/sys/vm/vnode_pager.c
index 4758456..61f4fd9 100644
--- a/sys/vm/vnode_pager.c
+++ b/sys/vm/vnode_pager.c
@@ -1179,6 +1179,7 @@ vnode_pager_lock(vm_object_t first_object)
{
struct vnode *vp;
vm_object_t backing_object, object;
+ int locked, lockf;

VM_OBJECT_LOCK_ASSERT(first_object, MA_OWNED);
for (object = first_object; object != NULL; object = backing_object) {
@@ -1196,13 +1197,19 @@ vnode_pager_lock(vm_object_t first_object)
return NULL;
}
vp = object->handle;
+ locked = VOP_ISLOCKED(vp);
VI_LOCK(vp);
VM_OBJECT_UNLOCK(object);
if (first_object != object)
VM_OBJECT_UNLOCK(first_object);
VFS_ASSERT_GIANT(vp->v_mount);
- if (vget(vp, LK_CANRECURSE | LK_INTERLOCK |
- LK_RETRY | LK_SHARED, curthread)) {
+ if (locked == LK_EXCLUSIVE)
+ lockf = LK_CANRECURSE | LK_INTERLOCK | LK_RETRY |
+ LK_EXCLUSIVE;
+ else
+ lockf = LK_CANRECURSE | LK_INTERLOCK | LK_RETRY |
+ LK_SHARED;
+ if (vget(vp, lockf, curthread)) {
VM_OBJECT_LOCK(first_object);
if (object != first_object)
VM_OBJECT_LOCK(object);

Attachment: pgpLGLK9ocSv5.pgp
Description: PGP signature



Relevant Pages

  • Re: [RFC] tcp: race in receive part
    ... Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already ... The customer has been able to reproduce this problem only on one CPU model: ... AJ18 only matters on unaligned accesses, tcp code doesnt do this. ... Memory operations issued after the LOCK will be completed after the LOCK ...
    (Linux-Kernel)
  • Re: [RFC] tcp: race in receive part
    ... Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already ... The customer has been able to reproduce this problem only on one CPU model: ... AJ18 only matters on unaligned accesses, tcp code doesnt do this. ... Memory operations issued after the LOCK will be completed after the LOCK ...
    (Linux-Kernel)
  • Re: [RFC] tcp: race in receive part
    ... The customer has been able to reproduce this problem only on one CPU model: ... Memory operations issued after the LOCK will be completed after the LOCK ... static void sock_def_readable(struct sock *sk, ...
    (Linux-Kernel)
  • Re: nfsd hung on ufs vnode lock
    ... the exclusive lock was for a cache directory shared by about ... this vnode is the directory vnode that is the parent ... interlock is not held when clearing the ...
    (freebsd-stable)
  • Re: NFS Locking Issue
    ... to am-utils running into some race condition the other problem is related to throughput, freebsd is slower than linux, and while freebsd/nfs/tcp is faster on Freebsd than udp, on linux it's the same. ... If you can help to produce simple test cases to reproduce the bugs you're seeing, ... First, architectural issues, some derived from architectural problems in the NLM protocol: for example, assumptions that there can be a clean mapping of process lock owners to locks, which fall down as locks are properties of file descriptors that can be inheritted. ... Once you've established whether it can be reproduced with a single client, you have to track down the behavior that triggers it -- normally, this is done by attempting to narrow down the specific program or sequence of events that causes the bug to trigger, removing things one at a time to see what causes the problem to disappear. ...
    (freebsd-stable)