Re: Intel 10Gb



On Tue, May 11, 2010 at 9:51 AM, Andrew Gallatin <gallatin@xxxxxxxxxxx> wrote:
Murat Balaban [murat@xxxxxxxxxxxxx] wrote:

Much of the FreeBSD networking stack has been made parallel in order to
cope with high packet rates at 10 Gig/sec operation.

I've seen good numbers (near 10 Gig) in my tests involving TCP/UDP
send/receive. (latest Intel driver).

As far as BPF is concerned, above statement does not hold true,
since there is some work that needs to be done here in terms
of BPF locking and parallelism. My tests show that there
is a high lock contention around "bpf interface lock", resulting
in input errors at high packet rates and with many bpf devices.

If you're interested in 10GbE packet sniffing at line rate on the
cheap, have a look at the Myri10GE "sniffer" interface.  This is a
special software package that takes a normal mxge(4) NIC, and replaces
the driver/firmware with a "myri_snf" driver/firmware which is
optimized for packet sniffing.

Using this driver/firmware combo, we can receive minimal packets at
line rate (14.8Mpps) to userspace.  You can even access this using a
libpcap interface.  The trick is that the fast paths are OS-bypass,
and don't suffer from OS overheads, like lock contention.  See
http://www.myri.com/scs/SNF/doc/index.html for details.

But your timestamps will be atrocious at 10G speeds. Myricom doesn't
timestamp packets AFAIK. If you want reliable timestamps you need to
look at companies like Endace, Napatech, etc.

We do a lot of packet capture and work on bpf(4) all the time. My
biggest concern for reliable 10G packet capture is timestamps. The
call to microtime up in catchpacket() is not going to cut it (it
barely cuts it for GIGE line rate speeds).

I'd be interested in doing the multi-queue bpf(4) myself (perhaps I
should ask? I don't know if non-summer-of-code folks are allowed?).
I believe the goal is not so much throughput but cache affinity. It
would be nice if say the listener application (libpcap) could bind
itself to the same core that the driver's queue is receiving packets
on so everything from catching to post-processing all work with a very
warm cache (theoretically). I think that's the idea.

It would also allow multiple applications to subscribe to potentially
different queues that are doing some form of load balancing. Again,
Intel's 82599 chipset supports flow based queues (albeit the size of
the flow table is limited).

Note, zero-copy bpf(4) is your friend in all use cases at 10G speeds! :)

-aps

PS I am not sure but Intel also supports writing packets directly in
cache (yet I thought the 82599 driver actually does a prefetch anyway
which had me confused on why that helps)
_______________________________________________
freebsd-net@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Intel 10Gb
    ... As far as BPF is concerned, above statement does not hold true, ... in input errors at high packet rates and with many bpf devices. ... But your timestamps will be atrocious at 10G speeds. ...
    (freebsd-performance)
  • Re: IEEE 1588 support in NTP?
    ... It doesn't make a lot of sense to moo time based on a herd of TSU counters unless each cow in the herd is wrangled to a common timescale. ... If each cow in the herd has a frequency error up to 100 PPM and the accuracy expectation is one microsecond, the moo needs to be adjusted on the order of 100 times each second. ... In my favorite the frame format leaves space for two timestamps following the checksum. ... identifier and timestmap of the packet just sent. ...
    (comp.protocols.time.ntp)
  • Re: libpcap perf improvement? latest ideas?
    ... performance (with comparison of linux and freebsd) I searched freebsd resources for pcap improvements. ... I agree that a reference model can be used to reduce the number of copies done currently for BPF. ... In PF_PACKET you are forced to do a system call per-acquisition and another system for receiving the time-stamp of the last packet read for example. ... I proposed a model that allows for dynamic ring buffer size and signaling for soft and hard-limits to allow application buffering to handle potential drops. ...
    (freebsd-hackers)
  • Re: bin/118005: Can No Longer SSH into 7.0 host
    ... LAN I cannot ssh into this host. ... implement timestamps and pass all tests that were added to FreeBSD ... This packet differs significantly from any other packet the client ...
    (freebsd-net)
  • Re: crash when bpf is used heavily
    ... It looks like the BPF code is written to handle the case where allocation ... but that it passes flags to the memory allocator that prevent the ... I've worked with that need to do processing of many high speed packet ...
    (freebsd-current)