RE: 4.7 vs 5.2.1 SMP/UP bridging performance

From: Robert Watson (rwatson_at_freebsd.org)
Date: 05/05/04

  • Next message: Eirik Oeverby: "Re: enabling second cpu"
    Date: Wed, 5 May 2004 17:49:34 -0400 (EDT)
    To: Gerrit Nagelhout <gnagelhout@sandvine.com>
    
    

    On Tue, 4 May 2004, Gerrit Nagelhout wrote:

    > I ran the following fragment of code to determine the cost of a LOCK &
    > UNLOCK on both UP and SMP:
    >
    > #define EM_LOCK(_sc) mtx_lock(&(_sc)->mtx)
    > #define EM_UNLOCK(_sc) mtx_unlock(&(_sc)->mtx)
    >
    > unsigned int startTime, endTime, delta;
    > startTime = rdtsc();
    > for (i = 0; i < 100; i++)
    > {
    > EM_LOCK(adapter);
    > EM_UNLOCK(adapter);
    > }
    > endTime = rdtsc();
    > delta = endTime - startTime;
    > printf("delta %u start %u end %u \n", (unsigned int)delta, startTime,
    > endTime);
    >
    > On a single hyperthreaded xeon 2.8Ghz, it took ~30 cycles (per
    > LOCK&UNLOCK, and dividing by 100) under UP, and ~300 cycles for SMP.
    > Assuming 10 locks for every packet(which is conservative), at 500Kpps,
    > this accounts for: 300 * 10 * 500000 = 1.5 billion cycles (out of 2.8
    > billion cycles) Any comments?

    One of the sets of changes I have in a local branch performs coallescing
    of interface unlock/lock operations. Right now, if you look at the
    incoming packet handling in interface code, it tends to read:

       struct mbuf *m;

       while (packets_ready(sc)) {
            m = read_packet(sc);
            XX_UNLOCK(sc);
            ifp->if_input(sc, m);
            XX_LOCK(sc);
       }

    I revised the structure for some testing as follows:

       struct mbuf *m, *mqueue, *mqueue_tail;

       mqueue = mqueue_tail = NULL;
       while (packets_read(sc)) {
           m = packets_ready(sc);
           if (mqueue != NULL) {
                mqueue_tail->m_nextpkt = m;
                mqueue_tail = m;
           } else
                mqueue = mqueue_tail = m;
       }
       if (mqueue != NULL) {
           XX_UNLOCK(sc);
           while (mqueue != NULL) {
               m = mqueue;
               mqueue = mqueue->m_nextpkt;
               m->m_nextpkt = NULL;
               ifp->if_input(ifp, m);
           }
           XX_LOCK(sc);
       }
          
    Obviously, if done properly, you'd want to bound the size of the temporary
    queue, etc, etc, but even in basic testing I wasn't able to measure an
    improvement on the hardware I had on-hand at the time. However, I need to
    re-run this in a post-netperf world and with 64-bit PCI and see if it does
    now. One important thing in this process, though, is to avoid reordering
    of packets -- they need to remain serialized by source interface. Doing
    it at this queue is easy, but if we start passing chains of packets into
    other pieces, we'll need to be careful where multiple queues get involved,
    etc. Even simple and relatively infrequent packet reordering can cause
    TCP to get pretty unhappy.

    The fact that the above didn't help performance suggests two things:
    first, that my testbed has other bottlenecks, such as PCI bus bandwidth,
    and second, that the primary cost currently involved isn't from these
    mutexes.

    Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
    robert@fledge.watson.org Senior Research Scientist, McAfee Research

    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Eirik Oeverby: "Re: enabling second cpu"

    Relevant Pages

    • Re: Changes in the network interface queueing handoff model
      ... bouncing around for some time is a restructuring of the network interface packet transmission API to reduce the number of locking operations and allow network device drivers increased control of the queueing behavior. ... to "start" output by the driver. ... encapsulation and wrapping, and notifies the hardware. ... The ifnet layer send queue is becoming decreasingly useful over time. ...
      (freebsd-arch)
    • Re: Changes in the network interface queueing handoff model
      ... bouncing around for some time is a restructuring of the network interface packet transmission API to reduce the number of locking operations and allow network device drivers increased control of the queueing behavior. ... to "start" output by the driver. ... encapsulation and wrapping, and notifies the hardware. ... The ifnet layer send queue is becoming decreasingly useful over time. ...
      (freebsd-net)
    • Re: Changes in the network interface queueing handoff model
      ... layer output routine via ifp->if_outputwith the ifnet pointer, packet, ... as ARP), and hands off to the ifnet driver via a call to IFQ_HANDOFF, ... encapsulation and wrapping, and notifies the hardware. ... The ifnet layer send queue is becoming decreasingly useful over time. ...
      (freebsd-arch)
    • Re: Performance Intel Pro 1000 MT (PWLA8490MT)
      ... > I used an old version of ttcp for testing. ... A small packet for me is ... > - when the tx queue fills up, the application should stop sending, at ... With polling, it would ...
      (freebsd-performance)
    • Changes in the network interface queueing handoff model
      ... 5BOne of the ideas that I, Scott Long, and a few others have been bouncing around for some time is a restructuring of the network interface packet transmission API to reduce the number of locking operations and allow network device drivers increased control of the queueing behavior. ... to "start" output by the driver. ... encapsulation and wrapping, and notifies the hardware. ... The ifnet layer send queue is becoming decreasingly useful over time. ...
      (freebsd-arch)