RE: 4.7 vs 5.2.1 SMP/UP bridging performance

From: Andrew Gallatin (gallatin_at_cs.duke.edu)
Date: 05/05/04

  • Next message: Anthony Ginepro: "Re: ATA_FLUSHCACHE failing"
    Date: Wed, 5 May 2004 17:23:30 -0400 (EDT)
    To: Bruce Evans <bde@zeta.org.au>
    
    

    Bruce Evans writes:

    >
    > Athlon XP2600 UP system: !SMP case: 22 cycles SMP case: 37 cycles
    > Celeron 366 SMP system: 35 48
    >
    > The extra cycles for the SMP case are just the extra cost of a one lock
    > instruction. Note that SMP should cost twice as much extra, but the
    > non-SMP atomic_store_rel_int(&slock, 0) is pessimized by using xchgl
    > which always locks the bus. After fixing this:
    >
    > Athlon XP2600 UP system: !SMP case: 6 cycles SMP case: 37 cycles
    > Celeron 366 SMP system: 10 48
    >
    > Mutexes take longer than simple locks, but not much longer unless the
    > lock is contested. In particular, they don't lock the bus any more
    > and the extra cycles for locking dominate (even in the !SMP case due
    > to the pessimization).
    >
    > So there seems to be something wrong with your benchmark. Locking the
    > bus for the SMP case always costs about 20+ cycles, but this hasn't
    > changed since RELENG_4 and mutexes can't be made much faster in the
    > uncontested case since their overhead is dominated by the bus lock
    > time.
    >

    Actually, I think his tests are accurate and bus locked instructions
    take an eternity on P4. See
    http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html

    For example, with your test above, I see 212 cycles for the UP case on
    a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a
    simple slock = 0; reduces that count to 18 cycles.

    If its really safe to remove the xchg* from non-SMP atomic_store_rel*,
    then I think you should do it. Of course, that still leaves mutexes
    as very expensive on SMP (253 cycles on the 2.53GHz from above).

    Drew

    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Anthony Ginepro: "Re: ATA_FLUSHCACHE failing"

    Relevant Pages

    • RE: 4.7 vs 5.2.1 SMP/UP bridging performance
      ... >> The extra cycles for the SMP case are just the extra cost ... Note that SMP should cost twice as much ... >> lock is contested. ... efficient for the Xeon. ...
      (freebsd-current)
    • Re: How to design this circuit?
      ... How fast a lock before the output reflects ... If the answer to the first is>3 cycles, and the answer to the second is ... then you could probably do it with a microcontroller. ... should be no problem with modern microcontroller speeds even ...
      (sci.electronics.design)
    • Re: [PATCH] x86: let 32bit use apic_ops too
      ... |> Maciej, but if we eliminate LOCK# by using simple MOV there will not ... | then we can use a straight MOV as consecutive writes are not a concern ... since - HOLD is not recognized during LOCK cycles. ...
      (Linux-Kernel)
    • RE: 4.7 vs 5.2.1 SMP/UP bridging performance
      ... > lock is contested. ... > bus for the SMP case always costs about 20+ cycles, ... > resolution profiling is used, ... This means that on the Xeon, each lock instruction take 120 cycles! ...
      (freebsd-current)
    • Re: Mix LGR and Text?
      ... whatever) by polling, you can be sure that you can use of cycles before the next (vsync or whatever) arrives, so as long as you don't exceed this number of cycles you'll be still in time to lock to the next... ... vapor lock bytes again. ... Then, the exact amount of work you do is not important, as long as you are done in time to reacquire. ...
      (comp.sys.apple2)