Re: File trees: the deeper, the weirder
- From: Kostik Belousov <kostikbel@xxxxxxxxx>
- Date: Sat, 18 Nov 2006 13:05:44 +0200
On Sat, Nov 18, 2006 at 12:54:00PM +0300, Yar Tikhiy wrote:
On Mon, Oct 30, 2006 at 03:47:37PM +0200, Kostik Belousov wrote:
On Mon, Oct 30, 2006 at 04:05:19PM +0300, Yar Tikhiy wrote:
On Sun, Oct 29, 2006 at 11:32:58AM -0500, Matt Emmerton wrote:
[ Restoring some OP context.]
On Sun, Oct 29, 2006 at 05:07:16PM +0300, Yar Tikhiy wrote:
As for the said program, it keeps its 1 Hz pace, mostly waiting on
"vlruwk". It's killable, after a delay. The system doesn't show ...
Weird, eh? Any ideas what's going on?
I would guess that you need a new vnode to create the new file, but no
vnodes are obvious candidates for freeing because they all have a child
directory in use. Is there some sort of vnode clearing that goes on every
second if we are short of vnodes?
See sys/vfs_subr.c, subroutine getnewvnode(). We call msleep() if we're
waiting on vnodes to be created (or recycled). And just look at the 'hz'
parameter passed to msleep()!
The calling process's mkdir() will end up waiting in getnewvnode() (in
"vlruwk" state) while the vnlru kernel thread does it's thing (which is to
recycle vnodes.)
Either the vnlru kernel thread has to work faster, or the caller has to
sleep less, in order to avoid this lock-step behaviour.
I'm afraid that, though your analysis is right, you arrive at wrong
conclusions. The process waits for the whole second in getnewvnode()
because the vnlru thread cannot free as much vnodes as it wants to.
vnlru_proc() will wake up sleepers on vnlruproc_sig (i.e.,
getnewvnode()) only if (numvnodes <= desiredvnodes * 9 / 10).
Whether this condition is attainable depends on vlrureclaim() (called
from the vnlru thread) freeing vnodes at a sufficient rate. Perhaps
vlrureclaim() just can't keep the pace at this conditions.
debug.vnlru_nowhere increasing is an indication of that. Consequently,
each getnewvnode() call sleeps 1 second, then grabs a vnode beyond
desiredvnodes. It's no surprise that the 1 second delays start to
appear after approx. kern.maxvnodes directories were created.
I think that David is right. The references _from_ the directory make it immune
to vnode reclamation. Try this patch. It is very unfair for lsof.
Index: sys/kern/vfs_subr.c
===================================================================
RCS file: /usr/local/arch/ncvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.685
diff -u -r1.685 vfs_subr.c
--- sys/kern/vfs_subr.c 2 Oct 2006 07:25:58 -0000 1.685
+++ sys/kern/vfs_subr.c 30 Oct 2006 13:44:59 -0000
@@ -582,7 +582,7 @@
* If it's been deconstructed already, it's still
* referenced, or it exceeds the trigger, skip it.
*/
- if (vp->v_usecount || !LIST_EMPTY(&(vp)->v_cache_src) ||
+ if (vp->v_usecount || /* !LIST_EMPTY(&(vp)->v_cache_src) || */
(vp->v_iflag & VI_DOOMED) != 0 || (vp->v_object != NULL &&
vp->v_object->resident_page_count > trigger)) {
VI_UNLOCK(vp);
@@ -607,7 +607,7 @@
* interlock, the other thread will be unable to drop the
* vnode lock before our VOP_LOCK() call fails.
*/
- if (vp->v_usecount || !LIST_EMPTY(&(vp)->v_cache_src) ||
+ if (vp->v_usecount || /* !LIST_EMPTY(&(vp)->v_cache_src) || */
(vp->v_object != NULL &&
vp->v_object->resident_page_count > trigger)) {
VOP_UNLOCK(vp, LK_INTERLOCK, td);
By the way, what do you think v_cache_src is for? The only two
places it is used in the kernel are in the unused function
cache_leaf_test() and this one, in vlrureclaim(). Is its main
purpose just to keep directory vnodes that are referenced by nc_dvp
in some namecache entries?
I think that yes. Now, it mostly gives immunity for the vnodes that
could be used for getcwd()/lsof path lookups through namecache.
Does my change helped on you load ?
cache_leaf_test() seems to be way to go. By partitioning vlru reclaim into
two stages - first, which reclaim leaf vnodes (that it, vnodes that do
not contain child dirs in namecache), and second, which will be fired only
if first stage failed to free something and simply ignores v_cache_src, as
in my change. See comment for rev. 1.56 of vfs_cache.c.
Attachment:
pgpDmxE22l9x6.pgp
Description: PGP signature
- Follow-Ups:
- Re: File trees: the deeper, the weirder
- From: Yar Tikhiy
- Re: File trees: the deeper, the weirder
- References:
- Re: File trees: the deeper, the weirder
- From: Yar Tikhiy
- Re: File trees: the deeper, the weirder
- Prev by Date: Re: File trees: the deeper, the weirder
- Next by Date: Re: File trees: the deeper, the weirder
- Previous by thread: Re: File trees: the deeper, the weirder
- Next by thread: Re: File trees: the deeper, the weirder
- Index(es):
Relevant Pages
|