Re: HEADS UP: UNIX domain socket locking changes merged to CVS HEAD




On Wed, 28 Feb 2007, Stephane E. Potvin wrote:

Please let me know if you experience any problems with UNIX domain sockets -- these changes will affect applications that consume UNIX domain sockets directly, like MySQL and Postfix, as well as consumers of POSIX fifos, which are implemented using UNIX domain sockets in-kernel.

Since this commit, I've been observing frequent deadlocks on my laptop, mostly when starting-up gnome. It usually takes less than 5 to 10 minutes for the deadlock to happens.

I was able to drop into ddb once and got the following information: (there might be some typos as I had to copy this manually)

Thanks, this information was very helpful, and indeed the problem is as you surmise: cases existed where more than one unpcb lock was acquired at a time when holding only a global read lock, not a global write lock. I guess these slipped through from an earlier version of the patch. In any case, could you try the patch at:

http://www.watson.org/~robert/freebsd/netperf/20070228-unp_deadlock.diff

This eliminates overlapped unpcb lock acquisition in both datagram and stream cases, and with any luck will fix the deadlock problem. It may also marginally improve performance by further reducing unpcb lock contention.

Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge


show alllocks
Process 906 (gnome-power-manager) thread 0xc553c570 (100126)
exclusive sleep mutex unp_mtx r = 0 (0xc5573bb8) locked @ /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849
shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked @ /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768
Process 860 (dbus-daemon) thread 0xc4d001d0 (100095)
exclusive sleep mutex unp_mtx r = 0 (0xc5573b10) locked @ /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849
shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked @ /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768

show lock 0xc5573bb8
class: sleep mutex
name: unp_mtx
flags: {DEF, RECURSE, DUPOK}
state: {OWNED, CONTESTED}
owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager")

show turnstile 0xc5573bb8
Lock: 0xc5573bb8 - (sleep mutex) unp_mtx
Lock Owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager")
Shared Waiters:
empty
Exclusive Waiters:
0xc4d001d0 (tid 100095, pid 860, "dbus-daemon")
Pending Threads:
empty

show lock 0xc5573b10
class: sleep mutex
name: unp_mtx
flags: {DEF, RECURSE, DUPOK}
state: {OWNED, CONTESTED}
owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon")

show turnstile 0xc5573b10
Lock: 0xc5573b10 - (sleep mutex) unp_mtx
Lock Owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon")
Shared Waiters:
empty
Exclusive Waiters:
0xc553c570 (tid 100126, pid 906, "gnome-power-manager")
Pending Threads:
empty

show lock 0xc06d1dac
class: rw
name: unp_global_rwlock
state: RLOCK: 2 locks
waiters: writers

show turnstile 0xc06d1dac
Lock: 0xc06d1dac - (rw) unp_global_rwlock
Lock Owner: none
Shared Waiters:
empty
Exclusive Waiters:
0xc4d00000 (tid 100096, pid 857, "gconfd-2")
0xc4d01570 (tid 100085, pid 804, "login")
0xc4fcaae0 (tid 100133, pid 887, "bonobo-activation-s")
0xc48c23a0 (tid 100106, pid 897, "gaim")
0xc4d01910 (tid 100120, pid 909, "gnome-screensaver")
0xc553cae0 (tid 100123, pid 905, "gnome-mount")
Pending Threads:
empty

bt 100095
Tracing pid 860 tid 100095 td 0xc4d001d0
shced_switch(3301966288,0,1,3226391662,3310601584,...) at 3226314602 = sched_switch+303
mi_switch(1,0,3227647346,647,3228084884,...) at 3226245932 = mi_switch+489
turnstile_wait(3310828472,3310601584,0,3310601586,3310828472,...) at 3226393861 = turnstile_wait+633
_mtx_lock_sleep(3310828472,3301966288,0,3227660663,877,...) at 3226177946 = _mtx_lock_sleep+261
_mtx_lock_flags(3310828472,0,3227660663,877,3310833112,...) at 3226177102 = _mtx_lock_flags+102
uipc_send(3310832888,0,3296484864,0,0,...) at 3226561343 = uipc_send+1058
sosend_generic(3310832888,0,3302262848,3296484864,0,...) at 3226529764 = sosend_generic_1067
sosend(3310832888,0,3302262848,0,0,...) at 3226530139 = sosend+63
soo_write(3304721288,3302262848,3297254528,0,3301966288,...) at 3226433647 = soo_write+121
dofilewrite
kern_writev
writev
syscall

bt 100126
Tracing pid 906 tid 100126 td 0xc553c570
sched_switch
mi_switch
turnstile_wait
_mtx_lock_sleep
_mtx_locl_flags
uipc_send
sosend_generic
sosend
soo_write
dofilewrite
kern_writev
writev
syscalL

As you can see, the threads 100095 and 100126 both are waiting on each other's lock. The function uipc_send tries to lock two unp_mtx without holding a write lock on unp_global_rwlock. It seems that the write ownership is taken by uipc_send only if nam is not NULL or the PRUS_EOF flag is set. Both of these conditions are false in this particular call scenario. From the comments just above the second lock in uipc_usrreq.c, the global write lock should already acquired by the time we get there. I'm not sure where or under what condition the write lock should be acquired to correctly fix this. I'll keep the core around in case you want me to provide more information.

Regards,

Steph

_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: try/catch on dequeue is way slower than my own lock
    ... You need the lock anyway if there is item, ... Couple ways to go on empty. ... avoid the error condition and get explicit condition if queue is really ... was 100x slower than doing my own empty-check before dequeue. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: std::queue empty() is thread safe?
    ... Consider the situation where queue isn't empty on "if" ... > but gets empty on "lock". ... performance loss is so minimal that we shouldn't even consider it. ...
    (microsoft.public.dotnet.languages.vc)
  • Re: HEADS UP: UNIX domain socket locking changes merged to CVS HEAD
    ... sockets -- these changes will affect applications that consume UNIX domain sockets directly, like MySQL and Postfix, as well as consumers of POSIX fifos, which are implemented using UNIX domain sockets ... Thanks, this information was very helpful, and indeed the problem is as you surmise: cases existed where more than one unpcb lock was acquired at a time when holding only a global read lock, not a global write lock. ... Pending Threads: ...
    (freebsd-current)
  • Re: Multithreaded queue with wait event
    ... queue and signals the event, only one thread will receive the signal while ... empty to empty, it changes the state of the event as well. ... Acquire lock ... you can keep a count of waiters and skip ...
    (comp.programming.threads)
  • Re: HEADS UP: UNIX domain socket locking changes merged to CVS HEAD
    ... Please let me know if you experience any problems with UNIX domain sockets -- these changes will affect applications that consume UNIX domain sockets directly, like MySQL and Postfix, as well as consumers of POSIX fifos, which are implemented using UNIX domain sockets in-kernel. ... Lock: ... Pending Threads: ...
    (freebsd-current)