Re: a proposed callout API



Since nearly all callout_reset() calls use the same relative timeout as
previously, it seems rather polluting to expose the low level tick
calculations in the API. I'll bet if you just *CACHE* the last
translation it would be sufficient to optimize your callout paths:

callout_reset(...)
{
if (to_ticks == c->last_to_ticks)
... use c->last_to_translated_ticks;
else
... recalculate
...
}

Insofar as math overhead goes... well, if you REALLY want to make things
optimal you need to get rid of all those mutex operations you are doing
in the low level callwheel code.

I would recommend doing what we did, which is to make the call wheels
per-cpu and to issue the callout on the same cpu it was registered on.
Now, granted, DragonFly uses a more cpu-localized design, particularly
for network operations (which are the vast majority of callout operations
in the system). But you should really consider it. A cpu-localized
design replaces all mutexes and spinlocks in the implementation with
a simple critical section. Cross-cpu operations use IPI messages (which,
in DragonFly, very rarely occur since all the callout users are
cpu-localized). But assuming you deal with that issue in your network
stacks, OTHER uses of the callout API are well served by a cpu-localized
model. Because re-arming usually occurs FROM the callout callback
procedure, which itself is cpu-localized by the callout implementation,
you again wind up being able to use just a critical section and no
mutexes or spin locks.

One mutex or spinlock is worth half a dozen math operations. Even if
the locked bus cycle memory location is already owned by the calling
cpu you still wind up flushing the cpu's read and write pipeline, and
that is really nasty at the beginning of a procedure when the caller
of the procedure has just pushed a bunch of arguments onto the stack.

There is virtually no cache overhead in handling the callwheel due to
the burstiness effect of the slots, in particular when handling TCP
connections in bulk. There is so much locality of reference there
that for all intents and purposes callout_reset() becomes FREE if you
can just get rid of the mutexes.

In anycase, network operations are a bad place to use fine-grained
timeouts. It just doesn't work well... for example, using a TCP retry
timeout in the microsecond range almost guarentees a ton of false hits
due to cpu latency in handling the timeout on a heavily loaded system.
You need wiggle room and lost packets just aren't an issue on LANs.

Similarly if you want to change tsleep to use a fine-grained value,
the same rule applies... when tsleep is called with a timeout it is
almost always called with the same timeout. But nearly all uses of
tsleep are insensitive to the granularity of the timeout, and most
remaining uses are not in critical code paths (e.g. a device driver
that is resetting some low level hardware interface or something), so
it is questionable whether changing the API would reap any visible
reward.

There are a few places where a fine-grained timer is really useful, in
particular a periodic fine-grained timer. But don't try to do it
with the callout API. I recommend taking a look at our SYSTIMER API.
We use it to drive interface polling, the scheduler, the stat clock,
the hardclock, and to rate-limit interrupts.

-Matt
Matthew Dillon
<dillon@xxxxxxxxxxxxx>
_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • RE: Race in kevent
    ... > In both places the thread just sleeps until the timeout has fired (when I ... they sleep until the callout has finished executing. ... The problem is in the API. ... One of the design goals is that a callout can ...
    (freebsd-hackers)
  • Re: Race in kevent
    ... are clients of the timeout interface, but a deficiency in the timeout ... > filt_timerdetach frees the callout ... routine would probably utilize a finer-grained lock which can cause ... because the knote pointer that the filt_timerexpire ...
    (freebsd-hackers)
  • Re: GPIB Multidevice API timeout
    ... the default command timeout is much too long to wait. ... this 'multidevice' API, that isn't an option. ... Does anybody know a GPIB command other than ibdev() that sets the bus ...
    (sci.electronics.design)
  • Re: New "timeout" api, to replace callout
    ... The primary objective is to make it possible to have multiple timeout "providers" of possibly different kind, so that we can have per-cpu or per-net-stack timeout handing. ... Well, I think there is an important question to be discussed regarding combinatorics, context switching, and the ability to provide multiple callout threads. ... People have found the facility to provide their own worker threads and work pools surprisingly useful for taskqueue, so I find the concept of providing seperate callout wheels for different sorts of work appealing -- we could group, for example, high priority callouts in a separate thread from low priority callouts, avoiding priority inversion scenarions where high priority callouts in effect wait for low priority callouts due to the scheduling that occurs in calloutprocessing. ... If no CPU ...
    (freebsd-arch)
  • Re: ndis(4) patch to replace obsolete if_watchdog interface
    ... This works different to the rest of the network drivers. ... You arm the callout and stop it after each Tx, ... Set a timeout in case the chip goes out to lunch. ... The timer doesnt need to expire in order to be reset, ...
    (freebsd-current)