Re: LOR on sleepqueue chain locks, Was: LOR sleepq/scrlock



On Friday 09 May 2008 10:53:15 pm Aristedes Maniatis wrote:

On 23/04/2008, at 3:34 AM, John Baldwin wrote:

The
real problem at the bottom of the screen though is a real issue.
It's a LOR
of two different sleepqueue chain locks. The problem is that when
setrunnable() encounters a swapped out thread it tries to wakeup
proc0, but
if proc0 is asleep (which is typical) then its thread lock is a
sleep queue
chain lock, so waking up a swapped out thread from wakeup() will
usually
trigger this LOR.

I think the best fix is to not have setrunnable() kick proc0
directly.
Perhaps setrunnable() should return an int and return true if proc0
needs to
be awakened and false otherwise. Then the the sleepq code (b/c only
sleeping
threads can be swapped out anyway) can return that value from
sleepq_resume_thread() and can call kick_proc0() directly once it
has dropped
all of its own locks.

--
John Baldwin

The way you describe it, it almost sounds like this LOR should be
happening for everyone, all the time. To try and eliminate the
factors
which trigger it for us, we tried the following: removed PAE from
kernel, disabled PF. Neither of these things made any difference and
the error is fairly quickly reproducible (within a couple of hours
running various things to load the machine). The one thing we did not
test yet is removing ZFS from the picture. Note also that this box
ran
for years and years on FreeBSD 4.x without a hiccup (non PAE, ipfw
instead of pf and no ZFS of course).

There are two things. 1) Most people who run witness (that I know
of) don't
run it on spinlocks because of the overhead, so LORs of spin locks
are less
well-reported than LORs of other locks (mutexes, rwlocks, etc.). 2)
You have
to have enough load on the box to swap out active processes to get
into this
situation. Between those I think that is why this is not more widely
reported.


Hi John,

Thanks for your efforts so far to track this LOR down. I've been
keeping an eye on cvs logs, but haven't seen anything which looks like
a patch for this.

* is this still outstanding?
* or will it be addressed soon?
* if not, should I create a PR so that it doesn't get forgotten?
* in our case, although we can trigger it quickly with some load, the
problem occurs (and causes a complete machine lock) even under < 10%
load. Not sure if the combination of PAE/ZFS/SCHED ULE exacerbates
that in any way compared to a 'standard' build.

Try http://www.FreeBSD.org/~jhb/patches/sleepq.patch

--
John Baldwin
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: LOR on sleepqueue chain locks, Was: LOR sleepq/scrlock
    ... These are all garbage in kuickshow. ... The specific LOR at ... Basically, the console driver locks ... if proc0 is asleep then its thread lock is a ...
    (freebsd-stable)
  • LOR on sleepqueue chain locks, Was: LOR sleepq/scrlock
    ... locks, so any printf while holding a thread lock will trigger a LOR. ... if proc0 is asleep then its thread lock is a sleep queue ...
    (freebsd-stable)
  • Re: LOR on sleepqueue chain locks, Was: LOR sleepq/scrlock
    ... It's a LOR ... of two different sleepqueue chain locks. ... if proc0 is asleep then its thread lock is a ... running various things to load the machine). ...
    (freebsd-stable)
  • Re: LOR sleepq/scrlock
    ... These are all garbage in kuickshow. ... locks, so any printf while holding a thread lock will trigger a LOR. ... if proc0 is asleep then its thread lock is a sleep queue ...
    (freebsd-stable)
  • Re: 5.2-RELEASE TODO
    ... Since this bug isn't critical, I won't commit it to 5.2 ... ::LOR does seem to have gone away. ... Slaggishness under load, but it may be whole different story.... ...
    (freebsd-current)