RE: 4.7 vs 5.2.1 SMP/UP bridging performance

From: Bruce Evans (bde_at_zeta.org.au)
Date: 05/05/04

  • Next message: Bruce Evans: "Re[3]: Modem + Network in Xircom cards, and maybe others"
    Date: Wed, 5 May 2004 23:32:18 +1000 (EST)
    To: Gerrit Nagelhout <gnagelhout@sandvine.com>
    
    

    On Tue, 4 May 2004, Gerrit Nagelhout wrote:

    > I ran the following fragment of code to determine the cost of a LOCK &
    > UNLOCK on both UP and SMP:
    >
    > #define EM_LOCK(_sc) mtx_lock(&(_sc)->mtx)
    > #define EM_UNLOCK(_sc) mtx_unlock(&(_sc)->mtx)
    >
    > unsigned int startTime, endTime, delta;
    > startTime = rdtsc();
    > for (i = 0; i < 100; i++)
    > {
    > EM_LOCK(adapter);
    > EM_UNLOCK(adapter);
    > }
    > endTime = rdtsc();
    > delta = endTime - startTime;
    > printf("delta %u start %u end %u \n", (unsigned int)delta, startTime,
    > endTime);
    >
    > On a single hyperthreaded xeon 2.8Ghz, it took ~30 cycles (per LOCK&UNLOCK,
    > and dividing by 100) under UP, and ~300 cycles for SMP. Assuming 10
    > locks for every packet(which is conservative), at 500Kpps, this accounts
    > for:
    > 300 * 10 * 500000 = 1.5 billion cycles (out of 2.8 billion cycles)

    300 cyles seems far too much. I get the following times for slightly
    simpler locking in userland:

    %%%
    #define _KERNEL
    #include ...

    int slock;
    ...
            for (i = 0; i < 1000000; i++) {
                    while (atomic_cmpset_acq_int(&slock, 0, 1) == 0)
                            ;
                    atomic_store_rel_int(&slock, 0);
            }
    %%%

    Athlon XP2600 UP system: !SMP case: 22 cycles SMP case: 37 cycles
    Celeron 366 SMP system: 35 48

    The extra cycles for the SMP case are just the extra cost of a one lock
    instruction. Note that SMP should cost twice as much extra, but the
    non-SMP atomic_store_rel_int(&slock, 0) is pessimized by using xchgl
    which always locks the bus. After fixing this:

    Athlon XP2600 UP system: !SMP case: 6 cycles SMP case: 37 cycles
    Celeron 366 SMP system: 10 48

    Mutexes take longer than simple locks, but not much longer unless the
    lock is contested. In particular, they don't lock the bus any more
    and the extra cycles for locking dominate (even in the !SMP case due
    to the pessimization).

    So there seems to be something wrong with your benchmark. Locking the
    bus for the SMP case always costs about 20+ cycles, but this hasn't
    changed since RELENG_4 and mutexes can't be made much faster in the
    uncontested case since their overhead is dominated by the bus lock
    time.

    -current is sloer than RELENG_4, especially for networking, because
    it does lots more locking and may contest locks more, and when it hits
    a lock and for some other operations it does slow context switches.
    Your profile didn't seem to show much of the latter 2, so the problem
    for bridging may be that there is just too much fine-grained locking.

    The profile didn't seem quite right. I was missing all the call counts
    and times. The times are not useful for short runs unless high
    resolution profiling is used, but the call counts are. Profiling has
    been broken in -current since last November so some garbage needs to
    be ignored to interpret profiles.

    Bruce
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Bruce Evans: "Re[3]: Modem + Network in Xircom cards, and maybe others"

    Relevant Pages

    • Re: Hifn driver in SMP (was Re: GELI - disk encryption GEOM class committed.)
      ... Both in UNi Processor mode and SMP, ... The hifn card does not ... in uni processor mode the card does not lock up. ...
      (freebsd-current)
    • RE: 4.7 vs 5.2.1 SMP/UP bridging performance
      ... >> The extra cycles for the SMP case are just the extra cost ... Note that SMP should cost twice as much ... >> lock is contested. ... efficient for the Xeon. ...
      (freebsd-current)
    • Re: [PATCH] reiserfs v3 fixes and features
      ... but even moving to a per fs spin lock instead of the bkl would ... And simple replacing bkl by per-superblock lock makes life ... SMP today as some start. ... Patch is against 2.6.4. ...
      (Linux-Kernel)
    • Re: [PATCH 1/1] x86: fix text_poke
      ... We shouldnt be doing that at all: the cost of LOCK is insignificant and most systems are SMP anyway. ... The question is how much we really care - the embedded people will simply build UP kernels, and this only affects booting SMP kernels on UP. ...
      (Linux-Kernel)
    • Re: using clustered index to optimize inserts ...
      ... I will try to explain locking in terms of Sybase docs... ... Allpages Locking: Allpages locking locks both data pages and index ... the data page is locked with an exclusive lock. ... Clustered Index: The datarows will be arranged as per the clustered ...
      (comp.databases.sybase)