Re: Advice on a multithreaded netisr patch?
- From: Ivan Voras <ivoras@xxxxxxxxxxx>
- Date: Mon, 06 Apr 2009 00:47:49 +0200
Thanks for the ideas, I will try some of them. But I'd also like some
more clarifications:
Robert Watson wrote:
On Sun, 5 Apr 2009, Ivan Voras wrote:
I'd like to understand more. If (in netisr) I have a mbuf with
headers, is this data already transfered from the card or is it
magically "not here yet"?
A lot depends on the details of the card and driver. The driver will
take cache misses on the descriptor ring entry, if it's not already in
cache, and the link layer will take a cache miss on the front of the
ethernet frame in the cluster pointed to by the mbuf header as part of
its demux. What happens next depends on your dispatch model and cache
line size. Let's make a few simplifying assumptions that are mostly true:
So, a mbuf can reference data not yet copied from the NIC hardware? I'm
specifically trying to undestand what m_pullup() does.
As the card and the OS can already process many packets per second for
something fairly complex as routing
(http://www.tancsa.com/blast.html), and TCP chokes swi:net at 100% of
a core, isn't this indication there's certainly more space for
improvement even with a single-queue old-fashioned NICs?
Maybe. It depends on the relative costs of local processing vs
redistributing the work, which involves schedulers, IPIs, additional
cache misses, lock contention, and so on. This means there's a period
where it can't possibly be a win, and then at some point it's a win as
long as the stack scales. This is essentially the usual trade-off in
using threads and parallelism: does the benefit of multiple parallel
execution units make up for the overheads of synchronization and data
migration?
Do you have any idea at all why I'm seeing the weird difference of
netstat packets per second (250,000) and my application's TCP
performance (< 1,000 pps)? Summary: each packet is guaranteed to be a
whole message causing a transaction in the application - without the
changes I see pps almost identical to tps. Even if the source of netstat
statistics somehow manages to count packets multiple time (I don't see
how that can happen), no relation can describe differences this huge. It
almost looks like something in the upper layers is discarding packets
(also not likely: TCP timeouts would occur and the application wouldn't
be able to push 250,000 pps) - but what? Where to look?
FYI, the localhost case is a bit weird -- I think we have some
scheduling issues that are causing loopback netisr stuff to be
pessimally scheduled. Here are some suggestions for things to try and
see if they help, though:
- Comment out all ifnet, IP, and TCP global statistics in your local
stack --
especially look for things tcpstat.whatever++;.
You mean for the general code? I purposely don't lock my statistics
variables because I'm not that interested in exact numbers (orders of
magnitude are relevant). As far as I understand, unlocked "x++" should
be trivially fast in this case?
- Use cpuset to pin ithreads, the netisr, and whatever else, to specific
cores
so that they don't migrate, and if your system uses HTT, experiment with
pinning the ithread and the netisr on different threads on the same
core, or
at least, different cores on the same die.
I'm using em hardware; I still think there's a possibility I'm fighting
the driver in some cases but this has priority #2.
- Experiment with using just the source IP, the source + destination IP,
and
both IPs plus TCP ports in your hash.
Ok. Currently I'm using ip1+ip2+port1+port2.
- If your card supports RSS, pass the flowid up the stack in the mbuf
packet
header flowid field, and use that instead of the hash for work placement.
Don't know about em. Don't really want to touch it if I don't have to :)
Attachment:
signature.asc
Description: OpenPGP digital signature
- Follow-Ups:
- Re: Advice on a multithreaded netisr patch?
- From: Robert Watson
- Re: Advice on a multithreaded netisr patch?
- References:
- Advice on a multithreaded netisr patch?
- From: Ivan Voras
- Re: Advice on a multithreaded netisr patch?
- From: Robert Watson
- Re: Advice on a multithreaded netisr patch?
- From: Ivan Voras
- Re: Advice on a multithreaded netisr patch?
- From: Robert Watson
- Re: Advice on a multithreaded netisr patch?
- From: Ivan Voras
- Re: Advice on a multithreaded netisr patch?
- From: Robert Watson
- Advice on a multithreaded netisr patch?
- Prev by Date: Re: Advice on a multithreaded netisr patch?
- Next by Date: Re: kern/133218: [carp] [hang] use of carp(4) causes system to freeze
- Previous by thread: Re: Advice on a multithreaded netisr patch?
- Next by thread: Re: Advice on a multithreaded netisr patch?
- Index(es):
Relevant Pages
|