Re: race on multi-processor solaris

From: Joe Seigh (jseigh_01_at_xemaps.com)
Date: 12/08/03


Date: Mon, 08 Dec 2003 12:27:48 GMT


David Schwartz wrote:
>
> "Jonathan Adams" <jonathan-ggl@ofb.net> wrote in message
> news:99d37172.0312080208.60fcf535@posting.google.com...
>
> > There is a difference -- adaptive mutexes spin with non-atomic operations,
> > lock-free algorithms spin with atomic ops. The bus traffic is much
> lighter
> > with the non-atomic operations.
>
> This is especially bad on the P4. All read-modify-write operations
> write, even if they don't have to (like a compare and swap where the compare
> fails). So if you have 8 CPUs and 7 of them fail to make forward progress,
> each of the 7 failures requires ugly FSB traffic to unshare and dirty the
> memory just to write the same value back!

Except that the instant the cpu holding the lock releases it, all the other
processors more or less simultaneously attempt to acquire it. The FSB traffic,
as you put it, doesn't look so good then. I'd say that the FSB traffic is no
worse for a lock-free algorithm than for a lock, assuming that you are using
the same definition for "contention" for both.

But it gets worse. If your mutex is in separate cache line than the data, then
you have an extra cache line bouncing between cpus, albeit at a somewhat slower
rate than the attempts on the mutex. But still, it's extra traffic.

But it gets much worse if the mutex and data are in the same cache line. Once
a cpu gets the lock, the cache line is immediately yanked out from under it by
the other processors' failed attempts to get the lock.

So, you'd probably want to keep that mutex in a separate cache line and eat the
cost of that extra cache line traffic.

So, I think if you include everything and not leave out the bits inconvienent
to your argument that locked vs. lock-free is not as skewed as you think.

Joe Seigh



Relevant Pages

  • Re: WaitForSingleObject() will not deadlock
    ... Why is a mutex unlock coupled to a cache synchronization issue? ... would be to provide either implicit cache flushing on lock and unlock (as the x86 does ...
    (microsoft.public.vc.mfc)
  • Re: WaitForSingleObject() will not deadlock
    ... There are architectures with cache tagged by virtual addresses (which also ... In those architectures, mutex operation ... would be to provide either implicit cache flushing on lock and unlock (as ...
    (microsoft.public.vc.mfc)
  • Re: WaitForSingleObject() will not deadlock
    ... But note that a multiprocessor that uses this architecture imposes severe limitations on ... they may require explicit cache coherency maintenance. ... In those architectures, mutex ... would be to provide either implicit cache flushing on lock and unlock ...
    (microsoft.public.vc.mfc)
  • Re: WaitForSingleObject() will not deadlock
    ... they may require explicit cache coherency maintenance. ... remember one architecture that could not implement multiprocessors because ... In those architectures, mutex ... would be to provide either implicit cache flushing on lock and unlock ...
    (microsoft.public.vc.mfc)
  • Re: Problem with BDC "View Profile" link
    ... but it seems to "short circuit" while it is reading the BDC cache. ... 71qj High Acquired Read lock on LobSystemInstance cache ... 71qj High Acquired Read lock on MethodInstance cache ... 71qj High Acquired Read lock on TypeDescriptor cache ...
    (microsoft.public.sharepoint.portalserver.development)