Re: Call for Testers: Convert watchdog spinlock into a sleepable mutex in if_ndis



On Mon, 2008-06-09 at 08:03 -0400, Coleman Kane wrote:
On Sun, 2008-06-08 at 03:55 +0200, Paul B. Mahol wrote:
On 6/7/08, Coleman Kane <cokane@xxxxxxxxxxx> wrote:
Hi,

I've got another patch to if_ndis and I'd like to know if any NDIS users
out there can test it for me and give me some feedback. I've converted
the ndis_spinlock in if_ndis into a normal sleepable mutex, and would
like to get some testers. The idea here is to eliminate an unnecessary
"spinning" in the kernel, where it would be preferable to sleep (for CPU
usage, interactivity, etc...).

You can download it here:
*
http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx.patch

--
Coleman Kane


I did not found any stability regressions in my environment with linked patch.

But I noticed new LOR (it appears as soon as radio of device is turned on):

lock order reversal:
1st 0xc401726c ndis0 (network driver) @
/usr/local/src/sys/modules/if_ndis/../../dev/if_ndis/if_ndis.c:1648
2nd 0xc09b3e74 HAL preemption lock (HAL lock) @
/usr/local/src/sys/modules/ndis/../../compat/ndis/subr_hal.c:423
KDB: stack backtrace:
db_trace_self_wrapper(c0708534,c3985bcc,c0541a6e,c070adf6,c09b3e74,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c070adf6,c09b3e74,c09b0a51,c09b0a48,c09b09f8,...) at
kdb_backtrace+0x29
witness_checkorder(c09b3e74,9,c09b09f8,1a7,c04f7364,...) at
witness_checkorder+0x6de
_mtx_lock_flags(c09b3e74,0,c09b09f8,1a7,c457ebd0,...) at _mtx_lock_flags+0xbc
KfRaiseIrql(2,c457ebd0,c3985c48,c09aacdd,c3d45354,...) at KfRaiseIrql+0x63
KfAcquireSpinLock(c3d45354,c094888a,c4017200,c3d6cc40,c4017200,...) at
KfAcquireSpinLock+0x13
IoQueueWorkItem(c457ebd0,c3d6cc40,0,c4017200,c07040ef,...) at
IoQueueWorkItem+0x2d
ndis_tick(c4017200,0,c070699c,166,c077e594,...) at ndis_tick+0x143
softclock(c077e560,0,c0701ca6,4d4,c3cda6ec,...) at softclock+0x24a
ithread_loop(c3cdb2e0,c3985d38,c0701a09,324,c3c99a70,...) at ithread_loop+0x1b5
fork_exit(c04e74f0,c3cdb2e0,c3985d38) at fork_exit+0xb8
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xc3985d70, ebp = 0 ---

Try this one:
* http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx2.patch

There's the same LOR in the existing code as well, but it is hidden by
WITNESS_SKIPSPIN.

If you can verify that this works, I'll just commit them both as a
single update.


Paul,

Ignore the previous patch (#2) and try this one instead:
* http://people.freebsd.org/~cokane/patches/if_ndis-spinlock-to-mtx3.patch

I analyzed the code a bit more and decided to push the mtx acquisition
so that it is inside the test for the TX-watchdog case. This should
result in having less traffic on that mutex under normal running
conditions. In this case, the mutex is only locked inside the
ndis_ticktask, and also inside the TX-watchdog handling code. In
general, when neither of these cases are true and the only action is a
simple re-schedule of the callout, the lock never gets acquired.

thompsa/jhb: What do you think of these last tweaks? I did some analysis
and I don't think that the ndis_hang_timer (rw), nmb_checkforhangsecs
(ro), and ndis_tx_timer (rw) are themselves subject to any races in the
current design (the rw/ro indicate their use within the unlocked
ndis_tick context).

Furthermore, as callout_drain(9) is the only method acting upon the
callout in any multithreaded context, it should be safe to perform
callout_reset(9) on the unlocked structure. I also did some more
research (just now) and it looks like ndis_start (called by
ndis_starttask) and ndis_reset_nic (in sys/compat/ndis/kern_ndis.c,
called by ndis_resettask) both perform their own locking. It is looking
to me like we may not even need to acquire NDIS_LOCK() from inside
ndis_tick() at all, but instead just use the existing locking in the
task-specific functions that it calls in wd/timeout cases.

--
Coleman Kane

Attachment: signature.asc
Description: This is a digitally signed message part