Re: cvs commit: src/sys/fs/nullfs null.h null_subr.c null_vnops.c

From: David Schultz (das_at_freebsd.org)
Date: 06/20/03

  • Next message: John-Mark Gurney: "Re: make /dev/pci really readable"
    Date: Fri, 20 Jun 2003 02:30:04 -0700
    To: Terry Lambert <tlambert2@mindspring.com>
    
    

    On Fri, Jun 20, 2003, Terry Lambert wrote:
    > David Schultz wrote:
    > > Yes, and my point was that it's important to maintain the
    > > separation, at least implicitly, in any new design. I think this
    > > point was obvious to the people concerned before I even mentioned
    > > it, so there's no need to rehash it, but the designers of certain
    > > other operating systems seem to have missed it.
    >
    > Well, Solaris "reinvented" the seperate VM and buffer cache
    > in Solaris 2.8. 8-(. I wasn't sure what you were recommending
    > from what you said.

    Let me make it clear that I'm not advocating the Solaris 8
    approach. But it would seem that the FS metadata cache needs to
    be insulated from the VM cache better than priority paging can
    provide. Perhaps it would be possible to enforce a sort of
    self-tuning version of separate VM and buffer caches, where the
    buffer cache has a carefully managed RSS that can scale based on
    both FS activity and memory pressure. That way, I/O-intensive
    workloads will not be allowed to suck too many pages away from
    user processes and the VM system will be able to better estimate
    actual memory pressure.

    > > The main problem isn't metastability or the lack of deadlock
    > > detection, it's that some workloads reasonably require more
    > > dependency tracking than the buffer cache can accomodate. At
    > > present, we can't track more than about 50 directories in the
    > > buffer cache.
    >
    > I don't know if I buy this directly. It's probably possible
    > to commit an incomplete tree, as long as it's complete from
    > the root, at any subtree point. Doing this, though, you would
    > have to switch from isosynchronous to synchronus processing on
    > the subtree for the remainder of its duration. This works,
    > because you use the associative property of the tree above to
    > replace it with a single edge segment; other orphan subtrees
    > of the same tree all have to fall into the same mode.

    I don't understand what you're getting at here. If you don't have
    enough space to cache more than 50 dependencies, you lose
    performance when your working set exceeds 50 directories, period.
    Trying to address this issue by making the softupdates flushing
    code smarter is only working around the limitations of the present
    buffer cache.

    > What was Kirk's answer?

    He didn't give me one, aside from advocating backing dependencies
    with the VM system. This issue just came up in passing a while
    ago in relation to a pathological case for softupdates that
    resulted in an explosion of dependencies that filled up the buffer
    cache and caused a deadlock. ;-) (The problem has since been
    hacked around, BTW.)

    > But the quoted "50" is the ideal, when all dependent operations
    > occur in the same tick, given the current wheel size; all this
    > strategy does is up the number (the real number isn't 50, it's
    > unfortunately 'size - max_n - 1') by making them occur virtually
    > in the same tick, even if they are spread out temporally otherwise.

    I think 50 is merely a number that makes softupdates not fill up
    the buffer cache and deadlock. Keep in mind that the dependency
    graph could have a large fanout, or it could be a multigraph.
    There's no magical association between ~50 directories and the
    maximum path length in the graph. Again, it's the buffer cache
    that's the primary problem, not softupdates.
    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  • Next message: John-Mark Gurney: "Re: make /dev/pci really readable"