Re: When did I lost packets?



Spoon <root@xxxxxxxxx> wrote:
There aren't many places where I could lose packets:
1. the sender (in my case, unlikely)

But you should still check the link-level statistics on the sending
side with ethtool.

2. the network (in my case, unlikely)

But you should still check the port stats on the switch.

3. the receiver

And you should check the link-level and transport level stats with
ethtool and netstat respectively.

I imagine it is possible for the receiver's kernel to be too busy to
service the network interrupts. But wouldn't ifconfig report that?

errors:0 dropped:0 overruns:0 frame:0

The above makes it look like the NIC didn't drop any packet...

Ifconfig would not report drops resulting from the UDP socket buffer
being full. That is a "transport level" statistic and ifconfig is
interface/link-level.

I noticed that I lose packets in bursts of 30-100 packets, and these
loss bursts are quite rare (~1 every 10-40 minutes). Someone told me
another high-priority process (another SCHED_FIFO??) might be running.

Well, this will happen in every OS with a normal (non-realtime)
schedule.

What will happen? The receiver is run as a SCHED_FIFO process. I don't
think there other SCHED_FIFO processes run in a default Linux build?

Only kernel threads might keep the CPU away from the receiver process.
And they are supposed to execute fast, right?

Fast is a relative thing.

I turned syslogd and klogd off.
I'm still dropping packets (420 in 77 million).

Personally, I think it is a pipe dream for anyone to think they'll be
able to have _zero_ drops when using UDP without explicit backpressure
flowcontrol. Still, if you have these "once in a lifetime" events
that cause a handful of datagrams to be lost, but otherwise the
receiver keeps-up without breaking a sweat - ie the CPU utilization on
the receiver is nearly nil - you could consider increasing the
SO_RCVBUF size to be large enough to handle the number of datagrams
received over that length of interval your application is prevented
from running. You may need to alter system-level limits via sysctl
first. Definitely make a getsockopt() call after your setsockopt()
call to see what SO_RCVBUF actually became.

rick jones
http://www.netperf.org/
--
The glass is neither half-empty nor half-full. The glass has a leak.
The real question is "Can it be patched?"
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
.