Re: a proposed callout API



In message <200611281631.19224.jhb@xxxxxxxxxxx>, John Baldwin writes:

John, I would very much welcome your participation on this.

On the absolute vs relative time thing, this gets far nastier once
you start to think about it.

As far as I know, nothing in the kernel asks for sleeps until a
given wall-clock (UTC) time. Userland on the other hand often does,
and almost never should, but lets leave that behind for a moment. [1]

Suspend/resume is a tricky complication here.

Some sleeps and callouts want to sleep on the "while the CPU is
concious" timescale, for instance for pushing dirty pages to disk
or collecting usage statistics.

Others want to sleep on the absolute (TAI) timescale, such as TCP
retransmission and keepalive timeouts. (The indicative internal/external
distinction is not safe btw.)

Right now we don't distinguish between the two cases, and my intention
was to leave this for a later stage where we could add flag-bits
to signal these desires, once an survey of the kernel code had
revealed which were the sensible default.

We can of course add the flags as no-ops already now where this
is immediately obvious to us.

Part of the idea was to fix
places that abused tsleep(..., 1), etc. to figure out a "real" sleep
interval.

This is going to be the major pain in the transition, no matter what
we do. Pretty much all short sleep and callout durations are bogus
because of the traditional rounding(-up) and HZ granularity.

Also, my other API change I was going to do was something like this:

msleep() -> mtx_sleep()
msleep_spin() -> sl_sleep() [...]
rw_sleep(), sx_sleep() [...]

I think this sounds eminently sensible, even if we initially do
just the crude thing, getting it expressed in the API allows
us to improve the implementation later on.

Poul-Henning

[1] OK, couldn't resist:

Much of this trouble comes about because it used to be that
only the UTC clock were available, and programs havn't been
rewritten to use CLOCK_MONOTONIC where they should.

Examples of bogus behaviour:

Named(8) wants to time zones out on the TAI scale not the UTC scale,
so it should not be affected by NTPD stepping the clock but only
the uptime of the system. Any amount of time the system is suspended
should be tolled on the timer.

Xlock suffers from the same and gets terribly upset when
NTPD steps the clock.

Various reminder tools, want to sleep until a given UTC time, but
end up sleeping the relative time we estimate until that time when
they go to sleep. If NTPD steps the clock while they sleep, they
do not find out and the reminder gets fired at the wrong time.

(Hint: Don't entrust calendar(8) with remembering you marriage
aniversary).

NTPD on the other hand, needs to know about suspend/resume
so it can DTRT to the clock and doesn't get told so it totally
makes a mess of things.

One conclusion I've reached is that the kernel should issue a
SIGTIMEWARP to all processes whenever there is a UTC clock
discontinuity. It's been suggested that devd(8) should do
this but I think it is a kernel task.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@xxxxxxxxxxx | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Netra X1 time warp
    ... > second machine, and in that case the clock is not adjusted. ... > stopped ntpd before it ever got to that point. ... 0432 MET is 0332 UTC, so this is only a 35 minute gap. ...
    (comp.sys.sun.misc)
  • Re: stepping and slewing
    ... ntpd -gq sets my kernel's frequency from the driftfile. ... the still running adjtimeslew ... seemed to perturbate the first minutes of the daemon behaviour. ... if (sleep < 0) ...
    (comp.protocols.time.ntp)
  • Re: Premature wakeup of time.sleep()
    ... >> clock, shich varies from os to os. ... Once you have a wakeup time, you can put your sleeper in a time-ordered queue, ... have the option of a hot return as if the sleep call hadn't happened. ... meaning any time there is any interrupt. ...
    (comp.lang.python)
  • Re: PXA270 - Sleep & Wake
    ... GPIO outputs assume their PGSRx defined states ... When we entered sleep, this 32K clocked stopped. ... the 32K clock source is an open collector output, ... >> Has GPIO 0 been configured as an input in the GPDR register? ...
    (microsoft.public.windowsce.platbuilder)
  • Re: Power-saving patch to NTP
    ... (on linux forex be d-bus aware? ... The impression I get is that ntpd should be treated as infrastructure, ... Well it should start with ntp getting signaled to prepare for various ... modes of sleep or cpu frequency slowdown. ...
    (comp.protocols.time.ntp)