Re: LOR with filedesc structure and Giant

From: Terry Lambert (tlambert2_at_mindspring.com)
Date: 07/29/03

  • Next message: Lukas Ertl: "Re: fixed another leak in USB code"
    Date: Tue, 29 Jul 2003 00:04:50 -0700
    To: Robert Watson <rwatson@FreeBSD.org>
    
    

    Robert Watson wrote:
    > On Sun, 27 Jul 2003, Kris Kennaway wrote:
    > > After upgrading last night, one of the package machines found this:
    >
    > I've bumped into some similar problems -- it's a property of how we
    > current lock select(). We hold the file descriptor lock for the duration
    > of polling each object being "selected", and if any of those objects has
    > to grab a lock for any reason, it has to implicitly fall after the file
    > descriptor lock. I actually run into this in some of our MAC code,
    > because I need to grab a vnode lock to authorize polling the vnode using
    > VOP_POLL(), and since the vnode lock is a sleep lock, this generates a
    > WITNESS warning. Unfortunately, it's not immediately clear what a better
    > locking scheme would look like without going overboard on the fine-grained
    > side. We probably need to grab Giant before entering the select code
    > since it's highly likely something in there will require Giant -- it
    > reaches down into VFS, the device stuff, socket code, tc.

    This is basically an instance of the race I warned against being
    an impending danger to FreeBSD last week, now that we have the
    possibility of multiple application threads in the kernel
    simutaneously.

    The easiest fix is to take a reference on the descriptors in the
    select itself around the calls to selscan(). This is annoying,
    because it's moderately expensive.

    The problem is that you have to avoid false positives, and you
    could sleep as a result of some of the underylying implementation
    stuff.

    Effectively, this means that you have to allocate a copy of the
    descriptor table to hold a reference over the whole thing, so
    that a close of the descriptors out from under you doesn't cause
    you to panic after coming back.

    The tricky part is that you have to change the semantics of the
    selscan() (this part is ugly), which you do by verifying that
    the descriptor you are using is the same one you were using. You
    have to do this because there's no way to safely avoid the race
    of the first and second selscan, where someone closes a descriptor
    out from under you, and then opens another one in its place in a
    separate thread in the same application.

    The way Solaris deals with this issue is that all sleeps are at
    the same level, so what it can do is force the call to fail out
    (basically, a cancellation point) for outstanding read() on an
    fd in one thread, with the descriptor being closed in another.

    It turns out that parts of the Java netwoking implementation in
    the full-on Java implementation actually *depends* on this
    behaviour, so we will have to bite the bullet on it eventually,
    or rewrite that code when/if it makes it to FreeBSD.

    The only other things I've thought of that could let you fix
    this is to be Evil(tm) and fail the close() with the only error
    that would cause the application to retry it later -- EINTR.
    This is truly evil, since it's likely the application doesn't
    check the return value from close, and even if it did, an EINTR
    would likely cause it to try again immediately.

    The only other semi-sane approach is Too Evil For Words(tm), which
    would be to delay the return on the close until the reference count
    goes from 2->1; the reason this is Too Evil For Words(tm) is this:
    what do you do when one thread is blocked in the close(), and one
    is blocked in a read(), and a third thread *also* calls close()?
    You'd pretty much be guaranteeing that applications that assume
    the Solaris semantics will all break (arguably, they are already
    broken, with one thread closing files out from under another, but
    they appear to function. I guess there's always "psignal(p,SIGABRT);"
    8-) 8-)).

    I don't think FreeBSD could really implement the Solaris approach,
    without a lot of code needing to be rewritten to support the idea
    of cancellation of blocked operations pending completion (basically,
    a wake-up-any-slept-on-this-fd-process). Maybe Dragonfly BSD will
    be able to solve this elegantly, but I really doubt it's going to
    be able to do it.

    You might also be able to kludge up an "internal signal", but the
    slept thread is probably not sleeping on e.g. the descriptor address
    (maybe it's slept in a malloc, etc.), so I don't know how you'd know
    that it had a reference to the thing your trying to take away from
    it without a rendevous.

    Maybe it's time to give each blocking context a kqueue or something
    equally gross. 8-(.

    -- Terry
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Lukas Ertl: "Re: fixed another leak in USB code"

    Relevant Pages

    • Re: Fine grain select locking.
      ... Briefly, linux uses RCU to protect the list, which is ... other hand uses the actual file lock to protect the descriptor slot. ...
      (freebsd-arch)
    • Re: Fine grain select locking.
      ... other hand uses the actual file lock to protect the descriptor slot. ... If the ref count and flags were done with atomics the main consumer of FILE_LOCK would actually be the unix domain socket garbage collection code. ...
      (freebsd-arch)
    • Re: Multiple IMAP connections to same folder
      ... it has a shared lock open on the mailbox on the server. ... So it opens another descriptor append the message to the destination, keeping the first descriptor for the open mailbox. ...
      (comp.mail.imap)
    • Re: Debugging USB device
      ... as soon as I claim being to USB HID keypad (and of course ... Note that when I remove the "keyboard" part of the descriptor to ... keep only the "LED" part (CAPS LOCK, NUM LOCK, SCROLL LOCK), the device is ... The next plug will be again a first plug. ...
      (microsoft.public.development.device.drivers)
    • Re: Fortran Guru requested
      ... > '05:00' is a character string, ... It's always a descriptor, and the difference is ... We have strong evidence that the '05:00' is being passed by reference. ... Yes, I agree that in my experience, character string literals are ...
      (comp.os.vms)