Re: 6.0-BETA2: taskqueue_drain for if_xl.c:2796

From: John Baldwin (jhb_at_FreeBSD.org)
Date: 08/16/05

  • Next message: John Baldwin: "Re: Loader serial baud rate control"
    To: freebsd-current@freebsd.org
    Date: Tue, 16 Aug 2005 13:21:13 -0400
    
    

    On Monday 15 August 2005 02:22 am, Scott Long wrote:
    > John Baldwin wrote:
    > > On Thursday 11 August 2005 07:07 pm, Scott Long wrote:
    > I still don't see what the point is of having a function that only
    > removes pending tasks from the queue if it doesn't also drain in
    > progress tasks. Simple example:
    >
    > CPU1 calls xl_stop. The XL_LOCK gets taken. At the same time, CPU2
    > calls taskqueue_swi, which pops the queue and calls xl_rxeof_task. It
    > tries to take the XL_LOCK but instead sleeps (or adaptively spins)
    > because CPU1 already has it. CPU1 calls taskqueue_stop, which doesn't
    > do anything because the task was alrady popped off.
    >
    > Now, at some point, CPU1 **MUST** drop the XL_LOCK and let CPU2 continue
    > that task to completion. There simply is no way around that, right? So
    > why muddle the API with a function that really doesn't solve any
    > problems. The argument that being able to call taskqueue_stop with
    > locks held is a convenience doesn't hold water. You'll still need to
    > drop locks in order to be sure that a mess isn't created.

    Well, you need to think through this longer. During a normal stop, we don't
    care if the task runs later if it is just going to stop anyway, which it will
    with the IFF_DRV_RUNNING checks I added (which it needs anyway even if you
    don't add a taskqueue_stop()). See, when CPU2 eventually runs the
    xl_rxeof_task, it will drop the lock and return instantly because the
    xl_stop() function clears IFF_DRV_RUNNING in addition to calling
    taskqueue_stop().

    Now, let's go back to the detach case where you actually need a drain. In
    that case you call xl_stop() first. It would be nice if that could go ahead
    and cancel the task if it's still in the queue without blocking at all.
    Calling taskqueue_stop() in xl_stop() will do that for you just fine. If you
    don't do that, you have to wait until any other tasks ahead of you on the
    queue decide to finish before your task can run. That's nuts! Using

            FOO_LOCK(sc);
            taskqueue_stop(&sc->task);
            FOO_UNLOCK(sc);
            taskqueue_drain(&sc->task);

    Results in the behavior that if the task hasn't run yet, you don't block, and
    thus you only block if the task is already running and your driver can
    guarantee that that won't be forever since it can ensure the task handler
    checks for the interface being stopped after it does its own FOO_LOCK().

    While we are on detach, you already have to add the IFF_DRV_RUNNING checks to
    the task handler even if you still use taskqueue_drain() in xl_stop() so that
    you don't block forever in that case (that is, the task handler has to
    realize that it's supposed to die so it will return and the draining thread
    will be resumed.)

    One other thing here. taskqueue_drain() is currently called from xl_stop(),
    which is called from xl_init_locked(), which is called from xl_intr(). Think
    "sleeping in an interrupt thread." There be dragons. :)

    This is common to almost all the ethernet drivers I've looked at in the past
    few weeks (de, dc, fxp, hme, sf, my, ste, pcn, el, and xl).

    > So what does that really buy you since you still have to deal with the
    > possible blocked task problem? Better to just organize the code so that
    > you can call taskqueue_dequeue with locks released and let the problem
    > be dealt with in a consistent way, rather than encourage people roll
    > their own dequeueing logic (best case) or become careless with
    > parallelization (worse case). I think that all that you are doing is
    > finding a way to cheat around a real synchronization problem. There are
    > a lot of network drivers in the tree that are notorious for
    > fundamentally poor locking design. I'd really rather see them be fixed
    > to conform to good APIs, rather than add marginal APIs that paper over
    > the problems.

    Errm, from all the driver's I've worked on so far, stop() is always called
    with the lock held the entire time and doesn't drop it. The drivers are
    written this way in 4.x as well (they don't drop spl()), so using
    callout_stop() and taskqueue_stop() instead of callout_drain() and
    taskqueue_drain() in foo_stop() is a lot simpler change. stop() is also
    called from interrupt handlers and can't really do anything that sleeps.
    Also, in general the entry points to the driver (ifnet functions, ifmedia
    functions, interrupt handlers, etc.) tend to lock the driver lock and the
    internal functions don't mess with the lock (except to drop the lock around
    calls to ifp->if_input()). That's a lot simpler than having to rewrite
    stop(). Also, letting me do a stop() that just lets the driver be in control
    of how long it has to wait in detach() when it tries to cancel its task
    rather than being subject to the whims of over tasks on the queue.

    Again, we are talking about a model where we fire off an event (stop your
    callout, stop your task) and defer the blocking of acknowledgement until a
    more convenient time. Outside of the detach() case, we don't even need to
    block waiting for the ack because there's no harm in having the event handler
    run, note that the interface is stopped, and abort. See, where's the problem
    in your original case with CPU2 running the task if the task handler checks
    the interface to see if it's been stopped and will abort if it gets run after
    CPU1 has done foo_stop()? For the non-detach() case we don't care when or if
    the task runs. For the detach() case we have to make sure it doesn't try to
    use our mutex (or the text of our kernel module) before we destroy the mutex
    and return from detach(). In other words, the drain() isn't there to
    synchronize state of the device itself, but to guarantee order of operations
    for driver teardown.

    -- 
    John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
    "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
    

  • Next message: John Baldwin: "Re: Loader serial baud rate control"

    Relevant Pages

    • Re: xl driver proplem.
      ... to lock access to the driver until some other thread has a way of getting ... run after the interface has been stopped via xl_stop. ... drained in detach() however to make sure that detachcan safely destroy ... the driver mutex at the end of the function. ...
      (freebsd-hackers)
    • Re: miibus + USB = problem
      ... detach, it unregisters the timeout. ... with the lock held, thus ensuring deadlock if the timeout fires after ... This is why you can't run the detach routine locked in most ... be out of the driver. ...
      (freebsd-hackers)
    • Re: OS Question
      ... driver, should not distress you. ... I've made the unwarranted assumption that having multiple threads is ... To protect data in user mode ... Permanence (if apps all crash the lock state is retained because ...
      (microsoft.public.development.device.drivers)
    • Re: LOR with netisr changes
      ... ::> It's just a bug in the driver. ... the softc and is intended to be covered by the driver's softc lock. ... this is a hack to work around a recursion in net80211. ...
      (freebsd-current)
    • Re: [patch 00/15] clocksource / timekeeping rework V4 (resend V3 + bug fix)
      ... which lock already depends on the new lock. ... now at 1915353302473 nsecs ... # You can enable one or both FireWire driver stacks. ...
      (Linux-Kernel)