Re: Packet loss every 30.999 seconds



Back to back test with no ethernet switch between two em interfaces,
same result. The receiving side has been up > 1 day and exhibits
the problem. These are also two different servers. The small
gettimeofday() syscall tester also shows the same ~30
second pattern of high latency between syscalls.

Receiver test application reports 3699 missed packets

Sender netstat -i:

(before test)
em1 1500 <Link#2> 00:04:23:cf:51:b7 20 0 15975785 0 0
em1 1500 10.1/24 10.1.0.2 37 - 15975801 - -

(after test)
em1 1500 <Link#2> 00:04:23:cf:51:b7 22 0 25975822 0 0
em1 1500 10.1/24 10.1.0.2 39 - 25975838 - -

total IP packets sent in during test = end - start
25975838-15975801 = 10000037 (expected, 1,000,000 packets test + overhead)

Receiver netstat -i:

(before test)
em1 1500 <Link#2> 00:04:23:c4:cc:89 15975785 0 21 0 0
em1 1500 10.1/24 10.1.0.1 15969626 - 19 - -

(after test)
em1 1500 <Link#2> 00:04:23:c4:cc:89 25975822 0 23 0 0
em1 1500 10.1/24 10.1.0.1 25965964 - 21 - -

total ethernet frames received during test = end - start
25975822-15975785 = 10000037 (as expected)

total IP packets processed during test = end - start
25965964-15969626 = 9996338 (expecting 10000037)

Missed packets = expected - received
10000037-9996338 = 3699

netstat -i accounts for the 3699 missed packets also reported by the
application

Looking closer at the tester output again shows the periodic
~30 second windows of packet loss.

There's a second problem here in that packets are just disappearing
before they make it to ip_input(), or there's a dropped packets
counter I've not found yet.

I can provide remote access to anyone who wants to take a look, this
is very easy to duplicate. The ~ 1 day uptime before the behavior
surfaces is not making this easy to isolate.

--
mark

On Dec 17, 2007, at 12:43 AM, Jeremy Chadwick wrote:

On Mon, Dec 17, 2007 at 12:21:43AM -0500, Mark Fullmer wrote:
While trying to diagnose a packet loss problem in a RELENG_6 snapshot dated
November 8, 2007 it looks like I've stumbled across a broken driver or
kernel routine which stops interrupt processing long enough to severly
degrade network performance every 30.99 seconds.

Packets appear to make it as far as ether_input() then get lost.

Are you sure this isn't being caused by something the switch is doing,
such as MAC/ARP cache clearing or LACP? I'm just speculating, but it
would be worthwhile to remove the switch from the picture (crossover
cable to the rescue).

I know that at least in the case of fxp(4) and em(4), Jack Vogel does
some through testing of throughput using a professional/high-end packet
generator (some piece of hardware, I forget the name...)

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http:// www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |

_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- unsubscribe@xxxxxxxxxxx"


_______________________________________________
freebsd-net@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Packet loss every 30.999 seconds
    ... the packet is not making it up the network stack. ... I grab a copy of netstat -s, netstat -i, and netstat -m ... Other than the link packets counter, ... test without an ethernet switch between the sender and receiver. ...
    (freebsd-net)
  • Re: Packet loss every 30.999 seconds
    ... the packet is not making it up the network stack. ... I grab a copy of netstat -s, netstat -i, and netstat -m ... Other than the link packets counter, ... test without an ethernet switch between the sender and receiver. ...
    (freebsd-stable)
  • Re: bad networking related lag in v2.6.22-rc2
    ... 1233 active connections openings ... 12 delayed acks further delayed because of locked socket ... 4867 packets directly queued to recvmsg prequeue. ... times receiver scheduled too late for direct processing ...
    (Linux-Kernel)
  • interpreting netstat -s
    ... 529092 duplicate acks ... what does the tcp output "embryonic connections ... 11772048 total packets received ... IP Multicast packets dropped due to no receiver ...
    (comp.unix.aix)
  • Re: Statistics Extraction
    ... 546 confirmed frames sent succesfully ... 549 packets sent successfully ... 105 transmits aborted due to rx ... packets dropped due to no receiver ...
    (comp.lang.perl.misc)