Re: Packet loss every 30.999 seconds



A little progress.

I have a machine with a KTR enabled kernel running.

Another machine is running David's ffs_vfsops.c's patch.

I left two other machines (GENERIC kernels) running the packet loss test
overnight. At ~ 32480 seconds of uptime the problem starts. This is really
close to a 16 bit overflow... See http://www.eng.oar.net/~maf/bsd6/ p1.png and
http://www.eng.oar.net/~maf/bsd6/p2.png. The missing impulses at 31 second
marks are the intervals between test runs. The window of missing packets
(timestamps between two packets where a sequence number is missing)
is usually less than 4us, altough I'm not sure gettimeofday() can be
trusted for measuring this. See https://www.eng.oar.net/~maf/bsd6/ p3.png

Things I'll try tonight:

o check on the patched kernel

o Try KTR debugging enabled before and after an expected high latency period.

o Dump all files to /dev/null to trigger the behavior.

I would expect the vnode problem to look a little different on the packet
loss graphs over time. If this leads anywher I'll add a counter
before the msleep() and see how often it's getting there.

On Dec 17, 2007, at 5:24 AM, David G Lawrence wrote:
I noticed this as well some time ago. The problem has to do with the
processing (syncing) of vnodes. When the total number of allocated vnodes
in the system grows to tens of thousands, the ~31 second periodic sync
process takes a long time to run. Try this patch and let people know if
it helps your problem. It will periodically wait for one tick (1ms) every
500 vnodes of processing, which will allow other things to run.

Index: ufs/ffs/ffs_vfsops.c
===================================================================
RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.290.2.16
diff -c -r1.290.2.16 ffs_vfsops.c
*** ufs/ffs/ffs_vfsops.c 9 Oct 2006 19:47:17 -0000 1.290.2.16
--- ufs/ffs/ffs_vfsops.c 25 Apr 2007 01:58:15 -0000
***************
*** 1109,1114 ****
--- 1109,1115 ----
int softdep_deps;
int softdep_accdeps;
struct bufobj *bo;
+ int flushed_count = 0;

fs = ump->um_fs;
if (fs->fs_fmod != 0 && fs->fs_ronly != 0) { /* XXX */
***************
*** 1174,1179 ****
--- 1175,1184 ----
allerror = error;
vput(vp);
MNT_ILOCK(mp);
+ if (flushed_count++ > 500) {
+ flushed_count = 0;
+ msleep(&flushed_count, MNT_MTX(mp), PZERO, "syncw", 1);
+ }
}
MNT_IUNLOCK(mp);
/*

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

_______________________________________________
freebsd-net@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Packet loss every 30.999 seconds
    ... I have a machine with a KTR enabled kernel running. ... Another machine is running David's ffs_vfsops.c's patch. ... I would expect the vnode problem to look a little different on the packet ... When the total number of allocated vnodes ...
    (freebsd-stable)
  • Re: [FOR 2.6.18 FIX][PATCH] drm: radeon flush TCL VAP for vertex program enable/disable
    ... That's a somewhat weird-looking patch. ... packet instead, but the code as is isn't actually set up to be able to do ... I'd have expected that the VAP state flush is really something ... Although maybe the kernel could keep track of whether the ...
    (Linux-Kernel)
  • [PATCH 1/2 RESEND - 2.6.11.7] x25 : Selective subaddress matching with call user data
    ... in the packet header. ... allow a match of call user data present in the call request packet with its ... The kernel currently matches ALL call user data, ... This patch is a follow up to the patch submitted previously by Andrew Hendry, ...
    (Linux-Kernel)
  • Re: RT patch acceptance
    ... judge the complexity of a design for that type of system. ... claim that you cannot judge the complexity of a kernel modification. ... Since the patch in question doesn't actually need that information to ... nanokernel's API up to date with additions to Linux's API that RT people ...
    (Linux-Kernel)
  • [RFC] Making percpu module variables have their own memory.
    ... Someone using the -rt patch found that one of the tracing options caused ... 64K for every CPU to cover all the per_cpu variables used in the kernel ... static void wakeup_softirqd_prio ...
    (Linux-Kernel)