Re: 4.7 vs 5.2.1 SMP/UP bridging performance

From: John Baldwin (jhb_at_FreeBSD.org)
Date: 05/06/04

  • Next message: John Baldwin: "Re: 4.7 vs 5.2.1 SMP/UP bridging performance"
    To: freebsd-current@FreeBSD.org
    Date: Thu, 6 May 2004 14:17:51 -0400
    
    

    On Thursday 06 May 2004 01:18 pm, Bruce Evans wrote:
    > On Thu, 6 May 2004, Bruce M Simpson wrote:
    > > On Thu, May 06, 2004 at 10:15:44AM -0400, Andrew Gallatin wrote:
    > > > For what its worth, using those operations yeilds these results
    > > > on my 2.53GHz P4 (for UP)
    > > >
    > > > Mutex (atomic_store_rel_int) cycles per iteration: 208
    > > > Mutex (sfence) cycles per iteration: 85
    > > > Mutex (lfence) cycles per iteration: 63
    > > > Mutex (mfence) cycles per iteration: 169
    > > > Mutex (none) cycles per iteration: 18
    > > >
    > > > lfence looks like a winner..
    > >
    > > Please be aware, though, that the different FENCE instructions are acting
    > > as fences against different things. The NASM documentation has a good
    > > quick reference for what each of the instructions do, but the definitive
    > > reference is Intel's IA-32 programmer's reference manuals.
    >
    > They are also documented in amd64 manuals.
    >
    > Don't they all act as fences only on the same CPU, so they are no help
    > for SMP? They are still almost twice as slow than full locks on Athlons,
    > so hopefully they do more.

    They are a traditional membar like membar on Sparc or acq/rel on ia64.
    membars only have to apply to the current CPU, but you have to use them in
    conjunction with a memory address used to implement a lock. Thus, when you
    acquire a lock, you want to use a lfence to ensure that the CPU won't go past
    the lfence (assuming lfence is like ia64 acq and sfence is like ia64 rel) for
    loads. This ensures that you don't read any of the locked values until you
    have the lock. On release you would use a sfence to prevent any stores from
    occurring before the store that releases the actual lock. The fence doesn't
    push out the pending writes to the other CPUs. However, it does mean that
    another CPU won't see that the lock is released unless it can also see all
    the other stores before the sfence. Thus, you can actually have a CPU spin
    waiting for a lock that is already unlocked. I've seen this on my test Alpha
    (DS20) where CPU0 unlocked sched_lock, CPU1 logged a KTR trace saying it was
    starting to spin on sched_lock, and a short time later, CPU1 then logged
    saying it had gotten sched_lock. I'm not sure if *fence is quite that weak.
    They might be though. Note that each generation of ia32 processors seems to
    have a weaker memory model than the previous generation.

    -- 
    John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
    "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
    

  • Next message: John Baldwin: "Re: 4.7 vs 5.2.1 SMP/UP bridging performance"

    Relevant Pages

    • 2.6.19-rc2 cpu hotplug lockdep warning: possible circular locking dependency
      ... Note that this is plain 2.6.19-rc2 (_without_ the slab cpu ... which lock already depends on the new lock. ... Using ACPI for SMP configuration information ... # ACPI Support ...
      (Linux-Kernel)
    • Re: race on multi-processor solaris
      ... > want to block if the lock holder is not running. ... and there is a CPU structure for each CPU. ... interrupts") are handled by "interrupt threads", ... Before we set the waiters bit, we grab the lock protecting the lock's ...
      (comp.unix.solaris)
    • Re: FreeBSD mail list etiquette
      ... :their Giant kernel lock, and their network lock. ... packet if the protocol thread is on a different cpu. ... These have to do with unexpected blocking deep in a ...
      (freebsd-hackers)
    • spinaphore conceptual draft (was discussion of RT patch)
      ... performance difference is divided by the chance of the resource ... that would optimize for minimum amount of lock contention. ... If the "spinaphore" told us, ... next registered EntryPoint and a message sent to that CPU -- an interrupt of ...
      (Linux-Kernel)
    • Re: Dynamic reads without locking.
      ... > 8-bit data you should be safe without the lock on any architecture. ... processes from one CPU to another, and a ... and examining the migration queue ... avoids penalizing migrated objects one scheduler cycle latency. ...
      (freebsd-hackers)