Re: cvs commit: src/sys/sys jail.h src/sys/kern kern_jail.c vfs_syscalls.c

From: Robert Watson (rwatson_at_FreeBSD.org)
Date: 02/15/04

  • Next message: Doug Rabson: "Read Copy Update"
    Date: Sun, 15 Feb 2004 12:01:56 -0500 (EST)
    To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
    
    

    On Sun, 15 Feb 2004, Pawel Jakub Dawidek wrote:

    > On Sat, Feb 14, 2004 at 10:31:12AM -0800, Robert Watson wrote:
    > +> Commiter: Robert Watson <rwatson@FreeBSD.org>
    > +> Branch: HEAD
    > +>
    > +> Files:
    > +> 1.36 src/sys/kern/kern_jail.c
    > +> 1.337 src/sys/kern/vfs_syscalls.c
    > +> 1.20 src/sys/sys/jail.h
    > +>
    > +> Log:
    > +> By default, when a process in jail calls getfsstat(), only return the
    > +> data for the file system on which the jail's root vnode is located.
    > +> Previous behavior (show data for all mountpoints) can be restored
    > +> by setting security.jail.getfsstatroot_only to 0. Note: this also
    > +> has the effect of hiding other mounts inside a jail, such as /dev,
    > +> /tmp, and /proc, but errs on the side of leaking less information.
    >
    > I don't like this fix...
    >
    > There are many problems related to the fact, that we store path where
    > file system is mounted as a string.
    >
    > This fix is one of them. I've wrote kld module some time ago that shows
    > file systems with cutted path in front (jail chroot directory was
    > removed). This wasn't a nice, clean way, but...
    >
    > In your fix we still leak of where-the-real-root-is information, of
    > course it is much better than we had before, but still not complete.
    >
    > Another problem (changing as PR somewhere) is that when you mount file
    > system in chroot environment, wrong path is stored (path releated to
    > chroot). This problem was really important in the past, because such
    > file system was totally unmountable, with FSID it is, but wrong path
    > still exists.
    >
    > I think the complete way is to store vnode related to the directory
    > where file system is mounted, instead of directory as a string. We have
    > some ideas to explore in future, for example allowing file systems
    > mounts inside of jail if vfs.usermount is 1 and then your fix will not
    > be enough. With such fix (vnode instead of string), we will be able to
    > always return file system names related to chroot directory. I'm still
    > not sure if we're able to implement this with our current vn_fullpath()
    > implementation, but we can try, or more - we can try to add a flag to
    > this function DONT_USE_CACHE_JUST_ASK_FILE_SYSTEM_DIRECTLY (as was
    > discussed on #thatchannel). Sooner or later we must do this (before
    > AUDIT will be merged?).
    >
    > I can prepare a patch to change this string to a vnode and we'll see.
    > What you say?

    Everything involving pathnames and VFS is evil and/or difficult. This
    problem smacks every UNIX system I've seen with regular frequency, and
    it's complicated by the following:

    - Vnodes may have no name (deleted but referenced files).

    - Vnodes may have more than one name (hard links) -- not only that, but
      new names can be created for most objects by unprivileged users.

    - Names may have more than one vnode (mountpoint covering, synthetic file
      systems).

    - Cached names become stale easily and cannot be easily updated.

    - Names are relative to a process context due to notions of current
      process root and current working directory.

    This is further complicated by the fact that UFS and NFS both encourage a
    philosophy of names simply being a "path" to reach an object, not a
    property of the object. Trying to change these assumptions will both be
    extremely difficult, and may also be un-UNIXy. However, there are some
    very strong motivations to find at least a partial solution:

    (1) Make mount strings returned by fsstat() and getfsstat() make sense
        regardless of context.

    (2) Make name pointers in procfs reliable and safe.

    (3) Provide accurate path information for security audit logs.

    Complications in solving this problem also include locking issues: it's
    generally safe to access the name cache if you have a strong vnode
    reference to look up "possible" names for an object. However, asking the
    file system for the name of an object reliably in UFS is probably both a
    disk-intensive and locking-complex operation (even pre-SMPng). The cache
    is, of course, unreliable for the above-identified reasons, and also that
    we can push intermediate vnodes in the path out of memory, meaning that
    it's a very expensive operation to pull them back in.

    If we lived in a world of HFS+ and volfs as on Darwin, we could cheat by
    returning the volfs path to the object, but that's not very useful from a
    user perspective, and so is basically useless despite being functionally
    correct (mostly).

    Finally, you might want to take a look at the implementation of
    vn_getpath() on Darwin, which relies on the stronger namespace semantics
    of HFS+, where all objects really do have parents, they maintain vnode
    back-pointers to parents, and can rely on the catalog entries for the
    directory tree being in memory (something we sacrifice for UFS directories
    for scalability reasons).

    So, I guess to conclude after railing: I went with the change I committed
    for the reason that it was the simplest change to give the desired result
    without increasing the strength of assumptions regarding the existence,
    correctness, and usefulness of pathnames. I agree we need a better
    solution, but juggling the traditional UNIX conventions for names and
    objects with the requirements of usability and security is hard. In
    earlier revisions of the patch, I did actually update the string for the
    root directory before exporting to userspace when masking other file
    system entries so that if you typed "df" in the jail, you saw the right
    "/" entry. However, I ommitted this in the committed version because it
    required the getfsstat() code to know more about how Jails work, whereas
    currently there's a simple jail decision function that is invoked by
    getfsstat(). I'm willing to explore many different alternative
    approaches, but I think we should avoid complexity, and also try to avoid
    hurting ourselves too badly on the sharp edges of UNIX namespaces.

    Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
    robert@fledge.watson.org Senior Research Scientist, McAfee Research

    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  • Next message: Doug Rabson: "Read Copy Update"

    Relevant Pages

    • Warnings & Issues
      ... file system does not contain a valid log ... vxfs fsck: cannot perform log replay ... incorrect CUT entry for fileset 999, fix? ...
      (SunManagers)
    • Re: Interpretation of extensions different from Unix/Linux?
      ... UTF-8 string. ... So by your definition these are "true" roots. ... Windows API, and it is the API functions which fail to ... rubbish has nothing to do with the file system. ...
      (comp.lang.ada)
    • Re: Recurring problem: processes block accessing UFS file system
      ... >> during a snapshot is fairly easy to reproduce, I did so and collected this ... so I traced each pid associated with a locked vnode separately. ... Softupdate processing performed after the loop has started might ... Maybe vnode recycling for a file system should be ...
      (freebsd-stable)
    • Re: PRE-PEP: new Path class
      ... >>subclass of single byte or unicode strings, ... string class it should subclass. ... >>particular file system has in it's warped little mind. ... If the path object describes a directory, ...
      (comp.lang.python)
    • Re: Comparison of NTFS/MFT recovery software?
      ... ChkDsk is inadequate and IMO is unfit for use, ... If it is not allowed to "fix" automatically, ... What you want is the ability to *interactively* check the file system, ... ChkDsk is NOT a data recovery tool, and has no right to presume to be ...
      (microsoft.public.windowsxp.help_and_support)