RE: 4.7 vs 5.2.1 SMP/UP bridging performance

From: Don Bowman (don_at_sandvine.com)
Date: 05/06/04

  • Next message: Andrew Gallatin: "RE: 4.7 vs 5.2.1 SMP/UP bridging performance"
    To: 'Bruce Evans' <bde@zeta.org.au>, Andrew Gallatin <gallatin@cs.duke.edu>
    Date: Thu, 6 May 2004 09:52:27 -0400 
    
    

    From: Bruce Evans [mailto:bde@zeta.org.au]
    > On Wed, 5 May 2004, Andrew Gallatin wrote:
    >
     ...

    > >
    > > Actually, I think his tests are accurate and bus locked instructions
    > > take an eternity on P4. See
    > > http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html
    > >
    > > For example, with your test above, I see 212 cycles for the
    > UP case on
    > > a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a
    > > simple slock = 0; reduces that count to 18 cycles.
    >
    > This seems to be right, unfortunately. I wonder if this has
    > anything to
    > do with freebsd.org having no P4 machines.
    >
    > > If its really safe to remove the xchg* from non-SMP
    > atomic_store_rel*,
    > > then I think you should do it. Of course, that still leaves mutexes
    > > as very expensive on SMP (253 cycles on the 2.53GHz from above).
    >
    > I forgot (again) that there are memory access ordering issues. A lock
    > may be needed to get everything synced. See the comment
    > before the i386
    > versions in i386/include/atomic.h. A single lock may be enough. The
    > best example I could think of easily is:

    On the P4, there are mfence,lfence,sfence instructions to enforce
    memory ordering. These are cheaper than "lock; andl" or "cpuid",
    which are the traditional 'sync' instructions.

    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Andrew Gallatin: "RE: 4.7 vs 5.2.1 SMP/UP bridging performance"

    Relevant Pages

    • Re: Teaching Assembly Language Programming
      ... at the description of the "Test and Set"-ish instructions in the Intel ... With only a mutex or a spin lock, ... is acquire lock, increment counter, release lock. ... lets say you have a pointer to the first element ...
      (alt.lang.asm)
    • Re: Compile Barrier
      ... > Barrier addressing issue with instructions reordering (pipelining feature on ... So a barrier is a mechanism for a programmer ... To address this issue compiler writes recognize some functions ... > - a will be written to the main memory before the lock is released ...
      (microsoft.public.win32.programmer.kernel)
    • Re: More on Canon Rebel XT noise at high ISO - 2 main new data points
      ... circumstance where you need to take an exposure lock the English used to describe operation of it is clear, concise, and has no evidence of the poor 'literal' translation that I have seen in some instructions. ... it is more profitable and that no one else that because Canon 'must be right' then you would probably have received less references to your stupidity, it was only quite recently that you understood the implication of the function described on page 101 of the camera instruction manual ("FE (flash exposure) lock obtains and locks the correct flash exposure reading for any part of the subject"). ...
      (rec.photo.digital.slr-systems)
    • Re: WaitForSingleObject() will not deadlock
      ... I'd like to see the EXACT SEQUENCE OF INSTRUCTIONS issued in the locking sequence, ... My issue about the 2 CPU clock cycles is that once the lock is set, ... cycle detection using non-recursive mutex: ...
      (microsoft.public.vc.mfc)
    • Re: NdisInterLockedIncrement/Decrement macros
      ... You are allowed to use LOCK prefix only with the instructions that I ... Therefore, "LOCK MOV ..." ... *would* add some locking mechanism. ...
      (microsoft.public.development.device.drivers)