Running the network stack without Giant -- what to try and when

From: Robert Watson (rwatson_at_FreeBSD.org)
Date: 07/18/04

  • Next message: Daniel Eriksson: "World breakage: -O2 and new libthr/libc_r code"
    Date: Sun, 18 Jul 2004 01:37:20 -0400 (EDT)
    To: current@FreeBSD.org
    
    

    As many of you have seen from status reports, e-mails, bug reports, etc,
    the FreeBSD Project has been working for some time on getting the network
    stack to run in parallel on multiple CPUs. We're now at a point where a
    substantial amount of functionality appears to run pretty successfully
    without the Giant lock, and we're ready for more people to start running
    it that way so we can find and fix problems. Let me start by enumerating
    a few caveats:

    - While we've been doing pretty heavy testing in MPSAFE configurations,
      the nature of multiprocessor development and adapting code for MP safety
      means that it's unlikely this will "just work" for all takers. However,
      it may well work for many.

    - We've focussed primarily on getting mainstream network configurations
      to run without Giant: this means that less mainstream subsystems (parts
      of IPv6, some netgraph nodes, IPX, etc) are currently unsafe without the
      Giant lock turned on. Some less mainstream network devices are also not
      currently able to operate without Giant. There is active work in all of
      these area to remedy this issue.

    - You may run into hard to diagnose problems. We'd like to try to
      diagnose them anyway, but if you start to experience new problems,
      you'll want to go read the Handbook chapter on preparing kernel bug
      reports and diagnosing problems. You'll also want to be prepared to run
      the system with INVARIANTS and WITNESS turned on.

    - Not all workloads will experience a performance benefit -- some, for
      various reasons, will get worse. However, several interesting
      performance loads get measurably better. If you don't see an
      improvement, or you see things get worse, please don't be surprised --
      you may want to look at some of the suggestions I make below on ways to
      make the results more predictable. Generally, you shouldn't see
      substantial performance degradation, if any, but it can't be ruled out,
      especially due to scheduling issues, etc.

    - We can and will destroy your data. We don't mean to, because we like
      your data, and we try not to, but this is, after all, operating system
      development.

    With all that in mind, now is in fact a good time to start experimenting
    with things, as these changes appear to be relatively stable in our
    initial testing. Note that there is some current instability in the CVS
    HEAD, and so I'd ask for some caution in reporting problems as being
    caused by debug.mpsafenet -- it may or may not be our fault :-). I've
    disabled PREEMPTION locally for thread centric testing, but haven't needed
    to for other testing.

    Here's some technical information on how to get started:

    (1) Determine if all of the stack components you will operate with are
        MPsafe. For common configurations, answering the following questions
        will help you decide this:

            - Are you using IPv6, IPX, ATM, or KAME IPSEC? If you answered
              yes to any of these questions, it is not yet safe for you to run
              without Giant.

            - Are your using Netgraph? If yes, it may be that you are not yet
              able to run without Giant. It is worth giving it a try, but you
              may experience panics, etc, especially in MP configurations.

            - Are you using SLIP or kernel PPP (not to be confused with user
              ppp, which is what most FreeBSD users use with modems).

            - Are you using any physical network interfaces other than the
              following: bge, dc, em, ep, fxp, rl, sis, xl, wi.

              The following may well work: en, gx, pcn, sf. However, they
              have not been marked MPSAFE by the driver maintainer.

              NOTE: Do you maintain a network interface driver? Is it not on
              this list? Shame on you! Or maybe shame on me for not listing
              it, even though it should work. Drop me a private e-mail with
              and questions or comments. Please update the busdma driver
              status web page with your driver's status.

    (2) If you are comfortable that you are using an MPSAFE-supported
        configuration, then you can use the following tunable in loader.conf
        to disable the Giant lock over the network stack on your system:

            debug.mpsafenet="1"

        Note that this is a boot-time only flag; you can inspect the setting
        with a sysctl, but it cannot currently be changed at runtime.

        Do a dmesg and confirm that all your probed network interfaces are
        marked as MPSAFE or not GIANT LOCKED (or whatever we call it now). If
        you have a network interface that is still GIANT LOCKED, it may not be
        able to function correctly with debug.mpsafenet=1. However, if you're
        not actively using it, it probably won't cause a problem. For
        example, firewire network interfaces can't currently be used with
        debug.mpsafenet=1. However, if idle, they shouldn't cause any
        problems. We're currently working to improve compatibility with
        device drivers that aren't mpsafe, and hope to have a prototype soon.

    Some notes:

    On SMP-centric performance measurements, such as local UNIX domain socket
    use by MySQL on MP systems, I've observed 30%-40% performance improvements
    by disabling Giant (some details below). My recommended configuration for
    testing out the impact of disabling Giant on MP systems is:

            - Set "options ADAPTIVE_MUTEXES" -- this seems to help a lot with
              contention and load.

            - Disable HTT. In my workloads, which tended to pound the kernel,
              this hurt quite a bit. Obviously, the effectiveness of HTT
              depends on the instruction mix, so this may not be for you.

            - Pick one of ULE and 4BSD, and then try the other. I found 4BSD
              helped a lot for MySQL, but I've seen other benchmarks with quite
              different results.

            - For stability purposes with MySQL, I currently have to disable
              PREEMPTION, as the MySQL benchmarks I use are pretty
              thread-centric and trigger preemption-related bugs with the kernel
              threading bits.

            - If you want to measure performance, make sure to disable
              INVARIANTS, INVARIANTS_SUPPORT, WITNESS, etc.

    Some notes on bug reporting:

            - Make sure to identify that you are running with debug.mpsafenet.
              If the problem is reproduceable, make sure to indicate if it goes
              away or persists when you disable debug.mpsafenet.

            - If you appear to be experiencing a hang/deadlock, please try
              running with WITNESS. I'd actually like to see most people
              running with WITNESS for a bit to shake out lock order issues, as
              I've introduced a lot of orders. If experiencing lock order
              reversals, please include the full console warning including
              stack trace.

            - INVARIANTS also considered good. Even if you aren't running
              with WITNESS, do run with INVARIANTS.

            - If you experience a hang, see if you can get into DDB -- if you
              are having problems getting in using a console break, try a serial
              console. When debugging, at minimum DDB 'ps' output, along with
              traces of interesting processes. Typically interesting will be
              processes that appear to be involved in the hang, etc.
              Obviously, this requires some intuition about what causes the
              hang and I can't offer hard and fast rules here.

            - Experimenting with debug.mpsafenet=1 and UP is also interesting,
              not just SMP. With PREEMPTION turned on, it may result in lower
              latency and/or lower throughput. Or not. Regardless, it's
              interesting -- you don't have to have SMP to give it a spin.

    FYI, while results can and will vary, I was pleased to observe moving from
    a UP->MP speedup of 1.07 on a dual-processor box to a speedup of 1.42 with
    the supersmack benchmark using 11 workers and 1000 select transactions
    with MySQL. For reference, that was with the 4BSD scheduler and adaptive
    mutexes. For loopback netperf with TCP and UDP, I observed no change in
    performance (well, 1% better for UDP RR, but basically no change). Note
    that the MySQL benchmark here is basically a UNIX domain socket IPC test,
    and so real world databases will give pretty different results since they
    won't be pure IPC. The results appear to be very sensitive to the choice
    of scheduler, and for a variety of reasons I've preferred 4BSD during
    recent testing (not least, better results in terms of throughput).

    There are a lot of people who have been working on this for quite some
    time -- I can't thank them all here, but I will point at the netperf web
    page as a place to look for ongoing patches, change logs, and some
    credits:

        http://www.watson.org/~robert/freebsd/netperf/

    I try to keep it up to date about once a week or so as I drop new patch
    sets.

    Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
    robert@fledge.watson.org Principal Research Scientist, McAfee Research

    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Daniel Eriksson: "World breakage: -O2 and new libthr/libc_r code"

    Relevant Pages

    • Re: Running the network stack without Giant -- change in default coming
      ... > to allow the network stack to run in parallel on multiple processors ... > currently unsafe without the Giant lock turned on. ... > configuration for testing out the impact of disabling Giant on MP ...
      (freebsd-current)
    • RE: Vulnerability scanners
      ... Qualys was that all you had to do is plug the appliance into your network ... It breaks it down into reports for techies and reports for ... >> to include some equipment costs in there. ...
      (Pen-Test)
    • Re: "Dont panic"?
      ... > I'm not sure what you mean by "public access through ssh". ... But I don't think reporting port scans is a clear win for anyone. ... >> port scan reports back to an ISP a lot of people time and network bandwidth ...
      (comp.security.ssh)
    • Re: Boot time extremely long
      ... 1003 reports relate to a Stop errors and would ... Here are the error and warning events. ... Your computer was not able to renew its address from the network ... I had previously checked the event viewer and found that the ati ...
      (microsoft.public.windowsxp.perform_maintain)
    • Removing NET_NEEDS_GIANT: first patch
      ... This source code declaration was used by optionally compiled components to declare a strict requirement for Giant, and forced Giant over the entire network stack. ... retrieving revision 1.13 ... diff -u -r1.13 amrr.c ...
      (freebsd-arch)