Re: FreeBSD 5.3 Network performance tests

From: Robert Watson (
Date: 11/11/04

  • Next message: Doug Hardie: "Upgrade to 5.3"
    Date: Thu, 11 Nov 2004 22:36:52 +0000 (GMT)

    On Thu, 11 Nov 2004 wrote:

    > Given these results, I would conclude that the raw routing stack in 5.3
    > is 35-40% slower than its 4.x counterpart.
    > The tests are easy enough to duplicate, so there is no reason to
    > question the numbers. Feel free to try it yourself. Obviously different
    > Mobos and CPUs will yield different numbers, but my experience with this
    > test is that the "differences" between the OS versions are linearly
    > similar on different systems.

    (was just pointed at this thread, sorry if I missed other posts)

    FreeBSD 5.3 sees an observably higher per-packet processing costs than the
    4.x branch due to in-progress changes to the synchronization and queueing
    models. Specifically, the SMPng work has changed the interrupt and
    synchronization models throughout the kernel in order to increase
    concurrency and preemptibility (i.e., lower latency in interrupt-based
    processing). However, this has increaseed the overall overhead of
    synchronization on the stack. The network stack forwarding path is
    particularly sensitive to this, so while other parts of the system see
    immediate concurrency benefits (i.e., socket-centric web servers that now
    see less contention on SMP, and more preemption on UP), this path still
    runs slower for many workloads. We're actively working to remedy this,
    and you will see changes merged to the 6.x and 5.x branches over the next
    couple of months that will cut into the numbers you see above by quite a
    bit. Off the top of my head, I would have expected to see more around a
    15% overhead on UP for the workload you're seeing, but as you point out,
    results can and do vary.

    There are a number of tunables presenting in 5.3 that can improve
    performance, which you may want to explore:

    - net.isr.enable, which enables direct dispatch of the network stack from
      the ithread, rather than context switching to the netisr, which adds
      overhead. This is an experimental feature, but works quite well in a
      number of environments to lower both latency (time to process) and
      overhead (cost to process). There is a known bug in inbound UDP
      processing with multiple packet sources on 5.3 with net.isr.enable
      enabled (hence it being experimental), but I will be backporting the fix
      shortly. However, for your workload, this bug won't manifest, as it's
      in address processing for locally delivered packets, not for forwarded

    - Make sure that if this is a UP box, you're compiling with a non-SMP
      kernel, as that substantially lowers the synchronization overhead.

    - Device polling, which eliminates the overhead of high rate interrupts,
      which can cause substantial context switching.

    - If your ethernet device supports interrupt coalescing but the thresholds
      are tuned wrong, you may be able to improve their tuning.

    - Disable entropy harvesting for ethernet devices and interrupts. There
      are optimizations present in 6.x that have not yet been backported that
      improve the overhead of entropy harvesting, but you can get the same
      benefits by disabling it. In your environment, it's likely not needed.
      I hope to backport these changes in a couple of weeks to 5-STABLE.

    - If other devices share the same IRQ with your ethernet devices, you may
      want to look at compiling out support for the devices. For example, I
      have a number of Dell boxes where the USB hardware uses the same
      interrupt as the ethernet device on the motherboard. The additional
      overhead associated with processing other devices is non-trivial,
      especially if the order of processing has changed in 5.x due to hardware
      probe order changes, ACPI, etc.

    Something I'd be interested in seeing you measure, since you have a
    specific test environment configured, is the incremental cost of adding a
    thousand firewall rules. The synchronize costs for firewall processing
    are based on entry to the firewall code, and don't apply to each rule. So
    you may find that while the cost of entering the first rule is higher in
    5.x, the cost to process additional rules is the same or lower, due to
    other optimizations, compiler improvements, etc.

    You can find information on the on-going network performance work at the
    following locations:

    I've just put a new web page online at:

    However, that page has probably not been rebuilt on most of the web server
    mirrors yet, so it might take a day or two to become reachable.

    There's quite an active team working on the netperf work, so as I
    mentioned above, while there is additional overhead for some paths
    currently, you should see improvements in the near future pipeline. Packet
    bridging and packet forwarding are both considered critical optimization
    targets for 5.4 (and 5-STABLE before then). One of the things we would
    find most helpful is people with interesting and useful workloads who are
    able to measure the impact of change proposals to improve performance. So
    if you're able to use this test environment to help us test changes in the
    pipeline, it would be much appreciated.


    Robert N M Watson FreeBSD Core Team, TrustedBSD Projects Principal Research Scientist, McAfee Research

    _______________________________________________ mailing list
    To unsubscribe, send any mail to ""

  • Next message: Doug Hardie: "Upgrade to 5.3"