Re: Packet loss every 30.999 seconds



Just to confirm the patch did not change the behavior. I ran with it
last night and double checked this morning to make sure.

It looks like if you put the check at the top of the loop and the next node
is changed during msleep() SLIST_NEXT will walk into the trash. I'm
in over my head here....

Setting kern.maxvnodes=1000 does stop both the periodic packet loss and
the high latency syscall's, so it does look like walking this chain
without yielding the processor is part of the problem I'm seeing.

The other behavior I don't understand is why the em driver is able
to increment if_ipackets but still lose the packet.

Dumping the internal stats with dev.em.1.stats=1:

Dec 19 13:07:46 dytnq-nf1 kernel: em1: Excessive collisions = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Sequence errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Defer count = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Missed Packets = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive No Buffers = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive Length Errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Receive errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Crc errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Alignment errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Collision/Carrier extension errors = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: RX overruns = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: watchdog timeouts = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XON Rcvd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XON Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XOFF Rcvd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: XOFF Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Good Packets Rcvd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: Good Packets Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: TSO Contexts Xmtd = 0
Dec 19 13:07:46 dytnq-nf1 kernel: em1: TSO Contexts Failed = 0

With FreeBSD 4 I was able to run a UDP data collector with rtprio set,
kern.ipc.maxsockbuf=20480000, then use setsockopt() with SO_RCVBUF
in the application. If packets were dropped they would show up
with netstat -s as "dropped due to full socket buffers".

Since the packet never makes it to ip_input() I no longer have
any way to count drops. There will always be corner cases where
interrupts are lost and drops not accounted for if the adapter
hardware can't report them, but right now I've got no way to
estimate any loss.

--
mark

On Dec 19, 2007, at 12:13 PM, David G Lawrence wrote:

Try it with "find / -type f >/dev/null" to duplicate the problem
almost
instantly.

I was able to verify last night that (cd /; tar -cpf -) > all.tar would
trigger the problem. I'm working getting a test running with
David's ffs_sync() workaround now, adding a few counters there should
get this narrowed down a little more.

Unfortunately, the version of the patch that I sent out isn't going to
help your problem. It needs to yield at the top of the loop, but vp isn't
necessarily valid after the wakeup from the msleep. That's a problem that
I'm having trouble figuring out a solution to - the solutions that come
to mind will all significantly increase the overhead of the loop.
As a very inadequate work-around, you might consider lowering
kern.maxvnodes to something like 20000 - that might be low enough to
not trigger the problem, but also be high enough to not significantly
affect system I/O performance.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Packet loss every 30.999 seconds
    ... Just to confirm the patch did not change the behavior. ... It looks like if you put the check at the top of the loop and the next node ... to increment if_ipackets but still lose the packet. ... not trigger the problem, but also be high enough to not significantly ...
    (freebsd-net)
  • Re: [Bugme-new] [Bug 8961] New: BUG triggered by oidentd in netlink code
    ... bug on demand though, it's not reocurred since I posted the bug report ... You might be able to trigger it without this patch by running ... just running the while loop a couple of times in parallel ...
    (Linux-Kernel)
  • [x86_64 MCE] [RFC] mce.c race condition (or: when evil hacks are the only options)
    ... The race requires a large number of machine checks to be occurring in order ... In the normal case, the rest would get cleaned up by the subsequent loop, ... fact is waiting for all CPUs to be done, which could take up to a tick -- or ... I've come up with a patch that does this, ...
    (Linux-Kernel)
  • Re: Problem with inotify
    ... > Thanks for writing that patch, ... > inotify-test before unmounting results in a clean unmount. ... My analysis was that there is an infinite loop and this is what ... > loop when unmounting with inotify watches active. ...
    (Linux-Kernel)
  • [PATCH] Part 1 of low level 802.1p priority support
    ... Here is the first patch to bring in 802.1p Packet Priority to FreeBSD; this is to support Differentiated Services and Quality-of-Service. ... This first stage enables FreeBSD to pass packets for 802.1q with VLAN 0 to the main input path in the stack, which is the IEEE standards-compliant behaviour. ...
    (freebsd-net)