Re: cvs commit: src/sys/fs/nullfs null.h null_subr.c null_vnops.c

From: Poul-Henning Kamp (phk_at_phk.freebsd.dk)
Date: 06/18/03

  • Next message: Sheldon Hearn: "Re: cvs commit: src/sys/fs/nullfs null.h null_subr.c null_vnops.c"
    To: Dmitry Sivachenko <demon@FreeBSD.org>
    Date: Wed, 18 Jun 2003 13:53:29 +0200
    
    

    In message <20030618112226.GA42606@fling-wing.demos.su>, Dmitry Sivachenko writes
    :

    [I've moved this to arch@]

    >> The main problems with nullfs seem to be locking and trying to create clones
    >> of the lower vnode (wrt. the VM system and special files). Once kern/51583
    >
    >BTW, what is the reason for creating these clone vnodes?
    >Why we can't simply return the original vnode?

    This is a question in the same caliber as a kid asking mom where
    the babies come from :-)

    Back in history, when vnodes first appeared as part of stacking
    filesystems, there were no merged vm/buffer cache.

    There were also some suboptimal design "decisions" made in the VFS
    implementation, made to expedite the implementation, but introducing
    issues which "could be cleaned up later".

    NFS added a few interesting wrinkles to the vnode area, mostly because
    it does not follow the model implicitly assumed in the VFS layering.
    The buffer cache expects a disk device behind all buffers, that took
    some hacking too.

    Then we got a semi-merged vm/buffer cache. Semi, becuase it was never
    finished so it became some sort of hybrid almost but not quite entirely
    unlike either state. A few filesystems got VOP_GETPAGES, none of them
    got VOP_PUTPAGES as far as I recall.

    Then we got softupdates and snapshots, which due to shortcomings in
    the vm/buf area could not be implemented in the architecturally
    obvious way, but instead had to put fingers into specfs and the
    buffer cache to get the job done.

    All of this have tangled the simple component formerly known as the
    buffer cache up in so many ways, that it is very hard for anybody
    to make heads and tails of it any more.

    So I am tempted to answer you question with: "Because it is all a
    mess"

    A number of us heavy-duty people have started to say rude things
    and do menacing gestures with our flow-diagram templates in the
    general direction of the buffer cache, but any real solution is
    unlikely to happen until we are talking 6-current.

    The cleanup would probably be easier to perform if we could ditch
    the stuff and layers which have been glued on and reduce the code
    to its core functionality first, and this may indeed be what we
    have to do, but considering the list of the stuff which are talking
    about, it is unlikely to be a politically feasible path to take:

            vinum -- abuses getebuf(), should be GEOM class.
            raidframe -- abuses getebuf(), should be GEOM class.
            cluster code -- must be rewritten
            snapshots -- must be untangled from the bio path.
            softupdates -- ditto.
            unionfs -- does not correctly layer VOP_STRATEGY
            nullfs -- maybe same problem.
            swap_pager -- abuses bogus vnode

    I am hoping that we may be able to carve a path by changing the
    bio structure operate on vm pages rather than KVM mapped
    byte arrays (most disk device drivers don't care for thing being
    mapped, they use bus-master DMA and only need physical location).

    Next, giving buffers a set of object methods could maybe avoid
    the detour around VOP_BMAP and VOP_STRATEGY thereby possibly
    making it possible for softupdates and snapshots to be implemented
    entirely inside UFS/FFS.

    I have a couple of other ideas I want to explore as well, one of
    them being not doing I/O via VCHR vnodes, but either at the fdesc
    level (when from userland) or via a dedicated API (for disk I/O
    from buf/vm).

    But I have only just started seriously investigating how all this
    can be done, and as I said, it is a royal mess, so it will take
    time no matter what I and others find.

    With that said, I will also add, that I will take an incredibly
    dim view of anybody who tries to add more gunk in this area, and
    that I am perfectly willing to derail unionfs and nullfs (or pretty
    much anything else on the list above) if that is what it takes to
    clean up the buffer cache. Any of those facilities can be reintroduced
    later on in a cleaner fashion.

    I agree that nullfs and unionfs are useful technologies, but if
    they have to be reimplemented to fit our kernel, then so be it.

    -- 
    Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
    phk@FreeBSD.ORG         | TCP/IP since RFC 956
    FreeBSD committer       | BSD since 4.3-tahoe    
    Never attribute to malice what can adequately be explained by incompetence.
    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
    

  • Next message: Sheldon Hearn: "Re: cvs commit: src/sys/fs/nullfs null.h null_subr.c null_vnops.c"