Re: Old SUN NFS performance papers.

From: Eric Anderson (anderson_at_centtech.com)
Date: 01/19/04

  • Next message: Eric Anderson: "Re: Old SUN NFS performance papers."
    Date: Mon, 19 Jan 2004 10:34:15 -0600
    To: Steve Francis <steve@expertcity.com>
    
    

    Steve Francis wrote:

    > Benchmarking seems like the best thing to do, however I have some info
    > I've collected from prior posts:
    >
    > These are from the thread "Slow disk write speeds over network" on
    > freebsd-performance@freebsd.org, all written by Terry Lambert.
    >
    > you should definitely use TCP with FreeBSD NFS servers; it's
    > also just generally a good idea, since UDP frags act as a fixed
    > non-sliding window: NFS over UDP sucks.
    >
    > Also, you haven't said whether you are using aliases on your
    > network cards; aliases and NFS tend to interact badly.
    >
    > Finally, you probably want to tweak some sysctl's, e.g.
    >
    > net.inet.ip.check_interface=0
    > net.inet.tcp.inflight_enable=1
    > net.inet.tcp.inflight_debug=0
    > net.inet.tcp.msl=3000
    > net.inet.tcp.inflight_min=6100
    > net.isr.enable=1
    >
    > Given your overloading of your bus, that last one is probably
    > the most important one: it enables direct dispatch.
    >
    > You'll also want to enable DEVICE_POLLING in your kernel
    > config file (assuming you have a good ethernet card whose
    > driver supports it):
    >
    > options DEVICE_POLLING
    > options HZ=2000
    >
    >
    > ...and yet more sysctl's for this:
    >
    > kern.polling.enable=1
    > kern.polling.user_frac=50 # 0..100; whatever works best
    >
    > If you've got a really terrible Gigabit Ethernet card, then
    > you may be copying all your packets over again (e.g. m_pullup()),
    > and that could be eating your bus, too.
    >
    >
    >> Huh. I thought that the conventional wisdom was that on a local network
    >> with no packet loss (and therefore no re-transmission penalties), udp
    >> was
    >> way faster because the overhead was so much less.
    >>
    >> Sorry if this seems like a pretty basic question, but can you explain
    >> this?
    >
    >
    > Sure:
    >
    > 1) There is no such thing as no packet loss.
    >
    > 2) The UDP packets are reassembled in a reassembly queue
    > on the receiver. While this is happening, you can only
    > have one datagram outstanding at a time. With TCP, you
    > get a sliding window; with UDP, you stall waiting for
    > the reassembly, effectively giving you a non-sliding
    > window (request/response, with round trip latencies per
    > packet, instead of two of them amortized across a 100M
    > file transfer).
    >
    > 3) When a packet is lost, the UDP retransmit code is rather
    > crufty. It resends the whole series of packets, and you
    > eat the overhead for that. TCP, on the other hand, can
    > do selective acknowledgement, or, if it's not supported
    > by both ends, it can at least acknowledge the packets
    > that did get through, saving you a retransmit.
    >
    > 4) FreeBSD's UDP fragment reassembly buffer code is well
    > known to pretty much suck. This is true of most UDP
    > fragment reassembly code in the universe, however, and
    > is not that specific to FreeBSD. So sending UDP packets
    > that get fragged because they're larger than your MTU is
    > not a very clever way of achieving a fixed window size
    > larger than the MTU (see also #2, above, for why you do
    > not want to used an effectively fixed window protocol
    > anyway).
    >
    > Even if there were no packet loss at all with UDP, unless all
    > your data is around the size of one rsize/wsize/packet, the
    > combined RTT overhead for even a moderately large number of
    > packets in a single run is enough to trigger the amortized cost
    > of the additional TCP overhead being lower than the UDP overhead
    > from the latency. Depending on your hardware (switch latency,
    > half duplex, etc.), you could also be talking about a significant
    > combined bandwidth delay product.
    >
    > Now add to all this that you have to send explicit ACKs with UDP,
    > while you can use piggy-back ACKs on the return payloads for TCP.
    >
    > I think the idea that UDP was OK for nearly-lossless short-haul
    > came about from people who couldn't code a working TCP NFS client.
    > .
    >
    >
    > Eric Anderson wrote:
    >
    >> Willem Jan Withagen wrote:
    >>
    >>> Hi,
    >>>
    >>> I had no responses to my recent question on the difference between
    >>> NFS over UDP
    >>> and TCP. So perhaps nobody cares??
    >>>
    >>> So I tried searching but have not found much yet.
    >>> Does anybody know where to find the white papers SUN once wrote
    >>> about tuning
    >>> NFS??? They should be at sun, but where??
    >>> All other suggestions to read are welcomed as well.
    >>>
    >>> Given my last posting I'm building two machines to do some NFS
    >>> benchmark testing
    >>> on.
    >>> Suggestions on what people "always wanted to know (tm)" are also
    >>> welcom, and
    >>> I'll see if I can get them integrated.
    >>> I've found the benchmarks in /usr/ports, some might do so nice work
    >>> as well.
    >>>
    >>> If people are interested I'll keep them posted in performance@
    >>>
    >>
    >> I'm definitely interested in what you find. I run a few heavily used
    >> FreeBSD NFS servers, and therefore always looking for tweaks and nobs
    >> to turn to make things better. In my experience, UDP has always been
    >> faster than TCP on NFS performance. Prior to 5.2, I have also seen
    >> mbuf related issues (all pretty much solvable with the right sysctl's).
    >> Let me know if I can help.
    >>
    >> Eric
    >>
    >>
    >

    I wasn't even sure where to start or stop snipping on this mail, since
    it is all good stuff - so I didn't. :) Thanks for the great info, and
    good explanations.. NFS+TCP is very nice, but I do believe the UDP
    transport was faster on a handful of tests (however I typically force
    use of TCP when I can)..

    One question - what does net.inet.ip.check_interface=0 do?

    Eric

    -- 
    ------------------------------------------------------------------
    Eric Anderson	   Systems Administrator      Centaur Technology
    All generalizations are false, including this one.
    ------------------------------------------------------------------
    _______________________________________________
    freebsd-performance@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-performance
    To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org"
    

  • Next message: Eric Anderson: "Re: Old SUN NFS performance papers."

    Relevant Pages

    • Re: UPD better than TCP in streaming video/audio ?
      ... > UDP gains speed over TCP because it carries no information that would ... it doesn't even know that packets were lost. ... which is perfect for UDP. ... > Finally, there's the possibility of multicast data - for instance, a live ...
      (microsoft.public.win32.programmer.networks)
    • Re: NTP and Firewall help needed.
      ... >>port 123 for udp and tcp. ... The action here is applied for packets that fall off ... > - ACCEPT any and all traffic coming from the localhost interface ...
      (comp.os.linux.setup)
    • Re: NTP and Firewall help needed.
      ... >port 123 for udp and tcp. ... Also the idea of combining rules for packets arriving at the local machine ... ACCEPT any and all traffic coming from the localhost interface ...
      (comp.os.linux.setup)
    • Re: UDP vs TCP
      ... I understand that UDP doesn't guarantee proper delivery of the message, that's why we have to add the CRC to the message to check if the message received is correct. ... TCP for instance will break up a large packet into smaller ... > into the packets and then the receiving app would have to read ...
      (microsoft.public.vb.enterprise)
    • NFS problem with recent 2.6 kernels (also serial console weirdness)
      ... 100000 2 tcp 111 portmapper ... 100000 2 udp 111 portmapper ... mounted filesystem with ordered data mode. ... Mounted root (ext3 filesystem) readonly. ...
      (Linux-Kernel)