Re: Advice on a multithreaded netisr patch?



Thanks for the ideas, I will try some of them. But I'd also like some
more clarifications:

Robert Watson wrote:
On Sun, 5 Apr 2009, Ivan Voras wrote:

I'd like to understand more. If (in netisr) I have a mbuf with
headers, is this data already transfered from the card or is it
magically "not here yet"?

A lot depends on the details of the card and driver. The driver will
take cache misses on the descriptor ring entry, if it's not already in
cache, and the link layer will take a cache miss on the front of the
ethernet frame in the cluster pointed to by the mbuf header as part of
its demux. What happens next depends on your dispatch model and cache
line size. Let's make a few simplifying assumptions that are mostly true:

So, a mbuf can reference data not yet copied from the NIC hardware? I'm
specifically trying to undestand what m_pullup() does.

As the card and the OS can already process many packets per second for
something fairly complex as routing
(http://www.tancsa.com/blast.html), and TCP chokes swi:net at 100% of
a core, isn't this indication there's certainly more space for
improvement even with a single-queue old-fashioned NICs?

Maybe. It depends on the relative costs of local processing vs
redistributing the work, which involves schedulers, IPIs, additional
cache misses, lock contention, and so on. This means there's a period
where it can't possibly be a win, and then at some point it's a win as
long as the stack scales. This is essentially the usual trade-off in
using threads and parallelism: does the benefit of multiple parallel
execution units make up for the overheads of synchronization and data
migration?

Do you have any idea at all why I'm seeing the weird difference of
netstat packets per second (250,000) and my application's TCP
performance (< 1,000 pps)? Summary: each packet is guaranteed to be a
whole message causing a transaction in the application - without the
changes I see pps almost identical to tps. Even if the source of netstat
statistics somehow manages to count packets multiple time (I don't see
how that can happen), no relation can describe differences this huge. It
almost looks like something in the upper layers is discarding packets
(also not likely: TCP timeouts would occur and the application wouldn't
be able to push 250,000 pps) - but what? Where to look?

FYI, the localhost case is a bit weird -- I think we have some
scheduling issues that are causing loopback netisr stuff to be
pessimally scheduled. Here are some suggestions for things to try and
see if they help, though:

- Comment out all ifnet, IP, and TCP global statistics in your local
stack --
especially look for things tcpstat.whatever++;.

You mean for the general code? I purposely don't lock my statistics
variables because I'm not that interested in exact numbers (orders of
magnitude are relevant). As far as I understand, unlocked "x++" should
be trivially fast in this case?

- Use cpuset to pin ithreads, the netisr, and whatever else, to specific
cores
so that they don't migrate, and if your system uses HTT, experiment with
pinning the ithread and the netisr on different threads on the same
core, or
at least, different cores on the same die.

I'm using em hardware; I still think there's a possibility I'm fighting
the driver in some cases but this has priority #2.

- Experiment with using just the source IP, the source + destination IP,
and
both IPs plus TCP ports in your hash.

Ok. Currently I'm using ip1+ip2+port1+port2.

- If your card supports RSS, pass the flowid up the stack in the mbuf
packet
header flowid field, and use that instead of the hash for work placement.

Don't know about em. Don't really want to touch it if I don't have to :)



Attachment: signature.asc
Description: OpenPGP digital signature



Relevant Pages

  • Re: Retina Scan vs. nmap, Nessus, Netscan
    ... >lot of TCP Resets which referenced back the original pack sent. ... I'd be looking at the flags in the initial SYN packets ... address, and type which should be 0800), then 20 bytes of IP header (the ...
    (comp.security.misc)
  • Re: [git pull] FireWire fixes and documentation update
    ... to receive the following IEEE 1394/ FireWire subsystem update. ... firewire: cdev: require quadlet-aligned headers for transmit packets ... firewire: cdev: change license of exported header files to MIT license ...
    (Linux-Kernel)
  • Re: UPD better than TCP in streaming video/audio ?
    ... > UDP gains speed over TCP because it carries no information that would ... it doesn't even know that packets were lost. ... which is perfect for UDP. ... > Finally, there's the possibility of multicast data - for instance, a live ...
    (microsoft.public.win32.programmer.networks)
  • Re: Simulating smaller MTU? ie sending small packets.
    ... This is due to the fact that TCP ... If you want smaller packets, ... >> set there as the MSS is announced by the receiver during the ... Yes, per connection. ...
    (comp.lang.perl.misc)
  • Re: NTP and Firewall help needed.
    ... >port 123 for udp and tcp. ... Also the idea of combining rules for packets arriving at the local machine ... ACCEPT any and all traffic coming from the localhost interface ...
    (comp.os.linux.setup)