Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?



On 5/25/06, Konstantin Belousov <kostikbel@xxxxxxxxx> wrote:
On Thu, May 25, 2006 at 01:19:26AM -0400, Kris Kennaway wrote:
> On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote:
>
> > So what's changed at that delta, under the one that works vfs_lookup.c is:
> >
> > Edit src/sys/kern/vfs_lookup.c
> > Add delta 1.80.2.6 2006.03.31.07.39.24 kris
> >
> >
> > Under the one that fails the vfs_lookup.c is:
> >
> > Edit src/sys/kern/vfs_lookup.c
> > Add delta 1.80.2.7 2006.04.30.03.57.46 kris
> >
> >
> >
> > So I stand corrected on my last post, the issue is in fact in this module, as
> > just taking that module back to 1.80.2.6 fixes the problem with my server. I
> > even took multiple NFS clients and gave them a heavy workload, and CPU still
> > remained reasonable, and very responsive. As soon as I rev to the new
> > version, NFS breaks badly and even a single client doing something like a du
> > of a directory structure results in sluggishness and extreme CPU usage.
>
> Yep, unfortunately this commit was necessary to fix other bugs. Jeff
> said he should have time to look at it next week.
>
> Kris

I tried to debug the problem. First, I have to admit that I cannot
reproduce the problem on GENERIC kernel. Only after QUOTAS where added,
and, correspondingly, UFS started to require Giant,
I get described behaviour. Below are the changes to GENERIC config file
I made to reproduce problem.

[...]
After that, server machine easily panics on

KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)),
("nfssvc_nfsd(): debug.mpsafenet=1 && Giant"));

from nfsserver/nfs_syscalls.c, line 570.

As I understand the problem, kern/vfs_lookup.c:lookup() could
aquire additional locks on Giant, indicating this by GIANTHELD
flag in nd. All processing in nfsserver already goes with Giant held,
so, I just dropped that excessive locks after return from lookup.
System with patch applied survived smoke test (client did
du on mounted dir, patch was generated from exported fs, etc.).
nfsd eats no more than 25% of CPU (with INVARIANTS).

Please, users who reported the problem and willing to help,
try the patch (generated against STABLE) and give the feedback.

[...]

Hi Konstantin and others,

I'm now running RELENG_6_1 as of Apr 30 04:00 UTC source + your
patch. The nfsd is quite happy! After client's du finishes, it
stays idle as expected (eats 0.00% CPU).

Thank you very much.

Regards,
Rong-En Fan
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: getting a threads state and CPU utilization
    ... of CPU time currently being used)? ... How could you tell an infinite loop from a polling loop ... responding to continuous messages from many client apps, ... though the customer *claims* that the client app is disconnecting (I suspect ...
    (microsoft.public.win32.programmer.kernel)
  • Re: RT patch acceptance (scheduler)
    ... > lock up the CPU in IRQ mode for human-perceptible time, ... For non-DMA IDE access data copies are CPU driven ... which can create tons of latency problems for that case. ... I suggest that you read the patch for the answer to softirq ...
    (Linux-Kernel)
  • Re: Erratum 383 fix for 32 bit x86 kernels
    ... This patch fixes the occurence of AMD Erratum 383 on ... Please consider to include this patch into the -stable kernel series. ... CPU hotplug codepaths on a 32-bit kernel. ... These TLB entries are marked global and large. ...
    (Linux-Kernel)
  • Re: [PATCH try 5] CFS: Add hierarchical tree-based penalty.
    ... Find below the reply Peter sent to William's v5 patch. ... making deadlines offset by the fork depth from init. ... As can be seen, on the dual core machine, a load of 2 makes the benchmark run ... almost precisely 1/3 slower as would be expected with BFS' fair CPU ...
    (Linux-Kernel)
  • Re: better wake-balancing: respin
    ... >>I guess I missed the objection for dropping the patch. ... correlation between the CPU the interrupt arrives on and the CPU the ... There is no point in immediate balancing either: ... If this patch hurts other workloads (and please ...
    (Linux-Kernel)