Re: rwlocks: poor performance with adaptive spinning



On Monday 24 September 2007 04:57:06 pm Jeff Roberson wrote:
On Mon, 24 Sep 2007, John Baldwin wrote:

On Saturday 22 September 2007 10:32:06 pm Attilio Rao wrote:
Recently several people have reported problems of starvation with
rwlocks.
In particular, users which tried to use rwlock on big SMP environment
(16+ CPUs) found them rather subjected to poor performances and to
starvation of waiters.

Inspecting the code, something strange about adaptive spinning popped
up: basically, for rwlocks, adaptive spinning stubs seem to be
customed too down in the decisioning-loop.
The desposition of the stub will let the thread that would adaptively
spin, to set the respecitve (both read or write) waiters flag on,
which means that the owner of the lock will go down in the hard path
of locking functions and will performe a full wakeup even if the
waiters queues can result empty. This is a big penalty for adaptive
spinning which can make it completely useless.
In addiction to this, adaptive spinning only runs in the turnstile
spinlock path which is not ideal.
This patch ports the approach alredy used for adaptive spinning in sx
locks to rwlocks:
http://users.gufi.org/~rookie/works/patches/kern_rwlock.diff

In sx it is unlikely to see big benefits because they are held for too
long times, but for rwlocks situation is rather different.
I would like to see if people can do benchmarks with this patch (maybe
in private environments?) as I'm not able to do them in short times.

Adaptive spinning in rwlocks can be improved further with other tricks
(like adding a backoff counter, for example, or trying to spin with
the lock held in read mode too), but we first should be sure to start
with a solid base.

I did this for mutexes and rwlocks over a year ago and Kris found it was
slower in benchmarks. www.freebsd.org/~jhb/patches/lock_adapt.patch is
the
last thing I sent kris@ to test (it only has the mutex changes). This
might
be more optimal post-thread_lock since thread_lock seems to have heavily
pessimized adaptive spinning because it now enqueues the thread and then
dequeues it again before doing the adaptive spin. I liked the approach
orginially because it simplifies the code a lot. A separate issue is that
writers don't spin at all if a reader holds the lock, and I think one
thing
to test for that would be an adaptive spin with a static timeout.

We don't enqueue the thread until the same place. We just acquire an
extra spinlock. The thread is not enqueued until turnstile_wait() as
before.

Oh. That's what I get for assuming what trywait() and cancel() did based on
their names. It is still more overhead than before though, so simplifying
adaptive spinning might still be a win now as opposed to before.

--
John Baldwin
_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: rwlocks: poor performance with adaptive spinning
    ... starvation of waiters. ... something strange about adaptive spinning popped ... the lock held in read mode too), but we first should be sure to start ... When the initial turnstile code was split out of the the mutex code, ...
    (freebsd-arch)
  • Re: rwlocks: poor performance with adaptive spinning
    ... something strange about adaptive spinning popped ... basically, for rwlocks, adaptive spinning stubs seem to be ... spin, to set the respecitve waiters flag on, ... the lock held in read mode too), but we first should be sure to start ...
    (freebsd-arch)
  • Re: rwlocks: poor performance with adaptive spinning
    ... something strange about adaptive spinning popped ... basically, for rwlocks, adaptive spinning stubs seem to be ... spin, to set the respecitve waiters flag on, ... the lock held in read mode too), but we first should be sure to start ...
    (freebsd-arch)
  • Re: rwlocks: poor performance with adaptive spinning
    ... something strange about adaptive spinning popped ... the lock held in read mode too), but we first should be sure to start ... I wanted the turnstile chain lock to protect only the association between the turnstile and lock. ... It's Owith the number of hash collisions, but it's a hash lookup. ...
    (freebsd-arch)
  • rwlocks: poor performance with adaptive spinning
    ... Recently several people have reported problems of starvation with rwlocks. ... users which tried to use rwlock on big SMP environment ... something strange about adaptive spinning popped ...
    (freebsd-arch)