Re: sendfile(2) SF_NOPUSH flag proposal

From: Terry Lambert (tlambert2_at_mindspring.com)
Date: 05/28/03

  • Next message: Terry Lambert: "Re: sendfile(2) SF_NOPUSH flag proposal"
    Date: Tue, 27 May 2003 22:54:53 -0700
    To: Igor Sysoev <is@rambler-co.ru>
    
    

    Igor Sysoev wrote:
    > How do suppose to coelesce the file pages ? Wire two or more pages
    > to mbuf's at once ?

    It's done by the network driver, using the network card's DMA's
    scatter/gather.

    > Terry, I do not understand you.
    > My argument is simple - I want to avoid the partial packets because it
    > decreases the number of packets. That's all. There's nothing about
    > amortized cost or total cost. I do not even know what they are.

    The total cost is the total overhead in packets to send a
    given amount of data. For a small amount of data, the total
    cost is small, compared to the overhead involved in sending
    the ethernet, IP, and TCP headers.

    The amortized cost is how much an extra packet costs you to
    send, relative to what you have to send anyway. If you have
    a lot of data to send, sending an extra packet or two is really
    not very costly, since it's just one more packet out of hundreds.

    If you argue there's a tiny amount of data, then the total
    cost is important.

    If you argue there's a lot of data, then the amortized cost
    is important.

    When you talk about extra packets being sent, you can't claim
    that the amortized cost is important for a small amount of data,
    or that the total cost is important for a huge amount of data.

    Your focus on number of packets, rather than your ability to
    move a total amount of data at or near the theoretical maximum,
    makes no sense.

    > > Actually, in this case, I'd just try to fix sendfile(2) to
    > > do the packet coelescing I'd expect, given the relative
    > > state of the TCP_NODELAY and TCP_NOPUSH options flags.
    >
    > Actually, sendfile() already works according to TCP_NOPUSH flag.
    > I do not know about TCP_NODELAY - I do not work with it.
    > But if you turn TCP_NOPUSH on then sendfile() will send the full packets.
    > If you turn TCP_NOPUSH off then sendfile() will send some packets partially
    > filled. It's correct.

    Sending some packets partially filled, instead of just the
    last packet in a series partially filled, is *wrong*, IMO.

    > > BTW: I'm still wary of the initial fault on the file data, if
    > > it's not already in cache: arguably, it's better to start
    > > sending the headers, and avoid the startup latency of delaying
    > > sending the headers until the fault is satisfied: part of the
    > > thing that's going to be eating your PCI bandwidth is the
    > > disk I/O, and your disks are going to be the slowest data
    > > sources/sinks in the whole equation.
    >
    > I agree but after all it's 20ms or so delay.

    Plus the delay for the NETISR.

    > > In any case, I expect that this should be handled in the
    > > context of TCP_NODELAY and TCP_NOPUSH, rather than by adding
    > > options to work around an arguably broken sendfile(2).
    >
    > sendfile() already works nice with TCP_NOPUSH. I propose only the flags
    > that allow to turn TCP_NOPUSH (actually TF_NOPUSH) on/off inside sendfile().
    > Then in one syscall you can turn TCP_NOPUSH on, send the HTTP header, the file
    > pages and turn TCP_NOPUSH off if all file pages are wired to mbuf's.
    > And this TCP_NOPUSH state is not bound by sendfile() internals, you
    > can control it via setsockopt/getsockopt(TCP_NOPUSH).

    You're wrong about what TCP_NOPUSH is for; it's only for the
    last packet of one system call being concatenated with the
    first packet of another, to save empty packets between seperate
    system calls.

    When you call sendfile with a file, headers, and trailers, you
    are making *only one system call*.

    "man 4 tcp" tells us:

         TCP_NOPUSH By convention, the sender-TCP will set the ``push'' bit and
                       begin transmission immediately (if permitted) at the end of
                       every user call to write(2) or writev(2). The TCP_NOPUSH
                       option is provided to allow servers to easily make use of
                       Transaction TCP (see ttcp(4)). When the option is set to a
                       non-zero value, TCP will delay sending any data at all
                       until either the socket is closed, or the internal send
                       buffer is filled.

    FWIW, here's what it tells us about TCP_NODELAY:

         TCP_NODELAY Under most circumstances, TCP sends data when it is pre-
                       sented; when outstanding data has not yet been acknowl-
                       edged, it gathers small amounts of output to be sent in a
                       single packet once an acknowledgement is received. For a
                       small number of clients, such as window systems that send a
                       stream of mouse events which receive no replies, this pack-
                       etization may cause significant delays. The boolean option
                       TCP_NODELAY defeats this algorithm.

    IMO, sendfile(2) should be acting the way you want it to act
    *just by you *NOT* setting TCP_NODELAY*.

    If you *do* set TCP_NOPUSH, then it should delay sending the
    last partial packet until the timer goes, or until you write(2),
    writev(2), sendfile(2), or send/sendto/sendmsg(2) more data.

    NOTE: TCP_NOPUSH *specifically* mentions writev(2), which, like
    sendfile(2), takes data from multiple discrete buffers and sends
    it.

    Make sense now? You think sendfile(2) needs options; I think
    sendfile(2) is broken.

    -- Terry
    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  • Next message: Terry Lambert: "Re: sendfile(2) SF_NOPUSH flag proposal"

    Relevant Pages

    • Re: OFDM, a key to rock solid BPL ...
      ... network is a national resource and should be protected went out with the ... extravagant cost of such ... ... depends upon how you schedule packets. ... expect to pay very little money and still get their choice. ...
      (rec.radio.amateur.antenna)
    • Re: throughput and interrupts
      ... While my home router is Celeron 1,7 Ghz with 512 MB RAM, the same xl NICs and I have almost 10 MB per second there with at least 48% of free CPU. ... BTW, there are typically three influences on processing cost for traffic: packets per second, size of packets, and contents of packets. ... As each packet requires two system calls for natd, lots of small packets can cost more to process than a smaller number of bigger packets. ... Any additional programs that perform per-packet operations will result in additional per-packet context switches, forcing full context switches between processes, not just switching in and out of the kernel, which is lighter-weight. ...
      (freebsd-current)
    • Re: Needles maybe
      ... ;-) Look at what the machine cost, ... leather needles in quantity at a decent price, ... Currently ten or so packets of needles at retail would cover what the ... introductory visit to the sewing machine guy. ...
      (rec.crafts.textiles.sewing)
    • Re: "Boradcasting" MACd data
      ... I suspect the CPU usage of all these signings would be ... you have measured or estimated the cost of the straightforward solution. ... the exercise of estimating the cost of simple signing before trying any ... How many packets per second do you need ...
      (sci.crypt)
    • Re: sendfile(2) SF_NOPUSH flag proposal
      ... >> in separate packet nevertheless the size of header and of the file. ... > tiny files, and a relatively high total cost, or you can argue ... My argument is simple - I want to avoid the partial packets because it ... sendfile() already works according to TCP_NOPUSH flag. ...
      (freebsd-arch)