Re: race on multi-processor solaris
From: Jonathan Adams (jonathan-ggl_at_ofb.net)
Date: 12/12/03
- Next message: Richard L. Hamilton: "Re: system accounting and space on /var"
- Previous message: Dimitri Maziuk: "Re: solaris 9 ssh ahangs on exit ?"
- In reply to: Joe Seigh: "Re: race on multi-processor solaris"
- Next in thread: Jonathan Adams: "Re: race on multi-processor solaris"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 11 Dec 2003 15:25:52 -0800
Joe Seigh <jseigh_01@xemaps.com> wrote in message news:<3FD7077F.5A681A58@xemaps.com>...
> Jonathan Adams wrote:
> >
> > Joe Seigh <jseigh_01@xemaps.com> wrote:
> > > "Casper H.S. ***" wrote:
> > > >
> > > > It's fairly; the blocker sets the waiters bit then searches for
> > > > the locking thread on all CPUs and then verifies that the thread
> > > > is still the owner and the waiters bit is still set before deciding
> > > > to block; using intervening membars and other properties of the system,
> > > > we can tell that this is right. The system has one other interesting
> > > > property: it will make sure that the criticial part of mutex_exit()
> > > > will be re-run if it was preempted.
> > > >
> > >
> > > So the system has to make timing guarantees to ensure that the
> > > race condition will not occur.
> >
> > I'm not sure what you are referring to, here. Not as I understand
> > the term "timing guarantees".
> >
>
> A race condition exists because of timing dependencies. If you can
> make certain timing guarantees then you can eliminate the race condition.
> So if the mutex exit takes time T to execute, including the time that
> it takes for the store of the lockword to become visible to the waiting
> thread, then all the waiting thread has to do avoid the race condition
> after setting the wait bit is wait longer than T and examine the lockword
> again. If the lockword hasn't changed then the mutex code will see the
> wait bit. The restart mutex_exit on interrupt provides the <= T guarantee
> if the thread is suspended during mutex_exit. I don't know what the
> examining other processors for running lock holder is about since you'd
> want to block if the lock holder is not running.
That's not at all what Solaris is doing. To fully explain, I'm going
to have to dump out a bunch of Solaris kernel internals. Let me know
if I missed something, or am unclear.
We're in the kernel, and there is a CPU structure for each CPU.
Active CPUs (those capable of running threads) are on a doubly-linked
"active" list. There is a thread structure for every thread, and the
address of that structure uniquely defines the thread during its
lifetime. One member of the CPU is a pointer to the thread which is
currently executing on it.
The constraint on this is that whenever a thread goes on-CPU (i.e. the
CPU structure is pointed to it and we resume its execution), and it
was in mutex_exit()'s critical section, its program counter is reset
to the beginning of mutex_exit(). Most interrupts ("low level
interrupts") are handled by "interrupt threads", which are allowed to
block, etc. Handling these effectively does a context switch to the
interrupt thread, then back to the interrupted thread when the
interrupt thread either finishes handling the interrupt or blocks.
In practice, resuming from almost every type of interrupt goes through
the same path, so we reset the PC most of the time. (there are some
cases where we don't, but they are rare.)
(In the below, "we" is the thread trying to block on the mutex, and
"he" is the owner)
"Check to see if owner is running on any CPU" is simply "walk the
active CPU list, seeing if any of them have owner as their active
thread". We do this twice -- first to see if we should block, and
again after setting the waiters bit.
The owner of the lock *only* changes it by calling mutex_exit().
Before we set the waiters bit, we grab the lock protecting the lock's
sleep queue chain (in Solaris, the turnstile hash bucket lock). This
does three things:
1. If the owner reaches the slow path of mutex_exit(), we'll be fine
-- he won't be able to proceed until we're done, and so he'll see us
if we block.
2. It keeps every other thread from mucking with the waiters bit.
3. It gives us dispatcher interrupt priority, so we cannot context
switch or be interrupted by low-level interrupts. (this isn't used in
the below algorithm)
The race we have to avoid is the owner waking up *right before* we set
the waiters bit, getting into the mutex_exit() fast-path, and not
seeing the change in the lock. Because we reset to the beginning of
mutex_exit() on context switches, we know that if he context switches
*after* we set the waiters bit, he cannot miss it.
The only way he could be running before we set the waiters bit, still
running after we check the CPU list, yet not on any CPU when we
checked the list, would be for him to move from a later CPU to an
earlier CPU during the scan. That requires a context switch.
The other possibility is that he wakes up before we set the waiters
bit, loads the lock, we set the waiters bit, he clears the lock, he
goes back to sleep before we see him on-CPU. That's why we check that
the lock still has the waiters bit set before sleeping -- it is
cleared if this occurs. (remember that no-one else can set the waiters
bit on the lock while we hold the turnstile bucket lock)
No timing guarantees needed.
(Thanks, by the way -- I'd never actually thought through all of the
possibilities before, and writing out the above gelled quite a bit of
it)
- jonathan
- Next message: Richard L. Hamilton: "Re: system accounting and space on /var"
- Previous message: Dimitri Maziuk: "Re: solaris 9 ssh ahangs on exit ?"
- In reply to: Joe Seigh: "Re: race on multi-processor solaris"
- Next in thread: Jonathan Adams: "Re: race on multi-processor solaris"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]