Re: Advice on a multithreaded netisr patch?
- From: Barney Cordoba <barney_cordoba@xxxxxxxxx>
- Date: Sun, 5 Apr 2009 10:25:41 -0700 (PDT)
--- On Sun, 4/5/09, Robert Watson <rwatson@xxxxxxxxxxx> wrote:
From: Robert Watson <rwatson@xxxxxxxxxxx>
Subject: Re: Advice on a multithreaded netisr patch?
To: "Ivan Voras" <ivoras@xxxxxxxxxxx>
Cc: freebsd-net@xxxxxxxxxxx
Date: Sunday, April 5, 2009, 9:54 AM
On Sun, 5 Apr 2009, Ivan Voras wrote:
moderation (em) but can't really explain it. The badI thought this has something to deal with NIC
performance part (not the jump) is also visible over the
loopback interface.
a card supporting multiple input queues -- igb, cxgb, mxge,
FYI, if you want high performance, you really want
etc. if_em-only cards are fundamentally less scalable in an
SMP environment because they require input or output to
occur only from one CPU at a time.
routing at least 250,000 packets per seconds per direction
Makes sense, but on the other hand - I see people are
with these cards, so they probably aren't the bottleneck
(pro/1000 pt on pci-e).
The argument is not that they are slower (although they
probably are a bit slower), rather that they introduce
serialization bottlenecks by requiring synchronization
between CPUs in order to distribute the work. Certainly
some of the scalability issues in the stack are not a result
of that, but a good number are.
Historically, we've had a number of bottlenecks in,
say, the bulk data receive and send paths, such as:
- Initial receipt and processing of packets on a single CPU
as a result of a
single input queue from the hardware. Addressed by using
multiple input
queue hardware with appropriately configured drivers
(generally the default
is to use multiple input queues in 7.x and 8.x for
supporting hardware).
- Cache line contention on stats data structures in drivers
and various levels
of the network stack due to bouncing around exclusive
ownership of the cache
line. ifnet introduces at least a few, but I think most
of the interesting
ones are at the IP and TCP layers for receipt.
- Global locks protecting connection lists, all rwlocks as
of 7.1, but not
necessarily always used read-only for packet processing.
For UDP we do a
very good job at avoiding write locks, but for TCP in 7.x
we still use a
global write lock, if briefly, for every packet.
There's a change in 8.x to
use a global read lock for most packets, especially
steady state packets,
but I didn't merge it for 7.2 because it's not
well-benchmarked. Assuming I
get positive feedback from more people, I will merge them
before 7.3.
- If the user application is multi-threaded and receiving
from many threads at
once, we see contention on the file descriptor table
lock. This was
markedly improved by the file descriptor table locking
rewrite in 7.0, but
we're continuing to look for ways to mitigate this.
A lockless approach
would be really nice...
On the transmit path, the bottlenecks are similar but
different:
- Neither 7.x nor 8.x supports multiple transmit queues as
shipped; Kip has
patches for both that add it for cxgb. Maintaining
ordering here, and
ideally affinity to the appropriate associated input
queue, is important.
As the patches aren't in the tree yet, or for
single-queue drivers,
contention on the device driver send path and queues can
be significant,
especially for device drivers where the send and receive
path are protected
by the same lock (bge!).
I'm curious as to your assertion that hardware transmit queues are a
big win. You're really just loading a transmit ring well ahead of actual transmission; there's no need to force a "start" for
each packet queued. You then have more overheard managing the multiple
queues; more memory used, more cpu cache needed, more interrupts
(perhaps), overhead generating the flowid. It seems to me that a more
efficient method of transmitting, such as offloading the transmit
workload to a kernel task, would be more effective than using
multiple transmit queues. All the source thread has to do is queue
the packet and get out.
As an aside, why is Kip doing development on a Chelsio card rather
than a more mainstream product such as Intel or Broadcom that would
generate more widespread interest?
Barney
_______________________________________________
freebsd-net@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- Re: Advice on a multithreaded netisr patch?
- From: Kevin Oberman
- Re: Advice on a multithreaded netisr patch?
- From: Robert Watson
- Re: Advice on a multithreaded netisr patch?
- References:
- Re: Advice on a multithreaded netisr patch?
- From: Robert Watson
- Re: Advice on a multithreaded netisr patch?
- Prev by Date: Re: Advice on a multithreaded netisr patch?
- Next by Date: Re: Multicast routing
- Previous by thread: Re: Advice on a multithreaded netisr patch?
- Next by thread: Re: Advice on a multithreaded netisr patch?
- Index(es):
Relevant Pages
|