RE: 4.7 vs 5.2.1 SMP/UP bridging performance

From: Andrew Gallatin (gallatin_at_cs.duke.edu)
Date: 05/05/04

  • Next message: Anthony Ginepro: "Re: ATA_FLUSHCACHE failing"
    Date: Wed, 5 May 2004 17:23:30 -0400 (EDT)
    To: Bruce Evans <bde@zeta.org.au>
    
    

    Bruce Evans writes:

    >
    > Athlon XP2600 UP system: !SMP case: 22 cycles SMP case: 37 cycles
    > Celeron 366 SMP system: 35 48
    >
    > The extra cycles for the SMP case are just the extra cost of a one lock
    > instruction. Note that SMP should cost twice as much extra, but the
    > non-SMP atomic_store_rel_int(&slock, 0) is pessimized by using xchgl
    > which always locks the bus. After fixing this:
    >
    > Athlon XP2600 UP system: !SMP case: 6 cycles SMP case: 37 cycles
    > Celeron 366 SMP system: 10 48
    >
    > Mutexes take longer than simple locks, but not much longer unless the
    > lock is contested. In particular, they don't lock the bus any more
    > and the extra cycles for locking dominate (even in the !SMP case due
    > to the pessimization).
    >
    > So there seems to be something wrong with your benchmark. Locking the
    > bus for the SMP case always costs about 20+ cycles, but this hasn't
    > changed since RELENG_4 and mutexes can't be made much faster in the
    > uncontested case since their overhead is dominated by the bus lock
    > time.
    >

    Actually, I think his tests are accurate and bus locked instructions
    take an eternity on P4. See
    http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html

    For example, with your test above, I see 212 cycles for the UP case on
    a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a
    simple slock = 0; reduces that count to 18 cycles.

    If its really safe to remove the xchg* from non-SMP atomic_store_rel*,
    then I think you should do it. Of course, that still leaves mutexes
    as very expensive on SMP (253 cycles on the 2.53GHz from above).

    Drew

    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Anthony Ginepro: "Re: ATA_FLUSHCACHE failing"

    Relevant Pages

    • Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
      ... cycles to the bus. ... LOCK slowness is not because of the bus. ... maybe 150-200 regular pipelined, superscalar instructions. ...
      (Linux-Kernel)
    • RE: 4.7 vs 5.2.1 SMP/UP bridging performance
      ... >> The extra cycles for the SMP case are just the extra cost ... Note that SMP should cost twice as much ... >> lock is contested. ... efficient for the Xeon. ...
      (freebsd-current)
    • Re: How to design this circuit?
      ... How fast a lock before the output reflects ... If the answer to the first is>3 cycles, and the answer to the second is ... then you could probably do it with a microcontroller. ... should be no problem with modern microcontroller speeds even ...
      (sci.electronics.design)
    • Re: [PATCH] x86: let 32bit use apic_ops too
      ... |> Maciej, but if we eliminate LOCK# by using simple MOV there will not ... | then we can use a straight MOV as consecutive writes are not a concern ... since - HOLD is not recognized during LOCK cycles. ...
      (Linux-Kernel)
    • RE: 4.7 vs 5.2.1 SMP/UP bridging performance
      ... > lock is contested. ... > bus for the SMP case always costs about 20+ cycles, ... > resolution profiling is used, ... This means that on the Xeon, each lock instruction take 120 cycles! ...
      (freebsd-current)

    Loading