[PATCH] Rework how we store process times in the kernel and deferring calcru()

From: John Baldwin (jhb_at_FreeBSD.org)
Date: 10/01/04

  • Next message: John Baldwin: "Re: [PATCH] Rework how we store process times in the kernel and deferring calcru()"
    To: arch@FreeBSD.org
    Date: Fri, 1 Oct 2004 11:02:43 -0400
    
    

    I'll commit this soonish unless there are any objections. The basic idea is
    to store process times resource usage as raw data (i.e. as bintimes and tick
    counts) for both process usage and child usage and only calculate the timeval
    style times if they are explicitly asked for. This lets us avoid always
    calling calcru() to calculate the timeval values in exit1() for example. A
    more detailed listing of the changes follows:

    - Fix the various kern_wait() syscall wrappers to only pass in a rusage
      pointer if they are going to use the result.
    - Add a kern_getrusage() function for the ABI syscalls to use so that they
      don't have to play stackgap games to call getrusage().
    - Fix the svr4_sys_times() syscall to just call calcru() to calculate the
      times it needs rather than calling getrusage() twice with associated
      stackgap, etc.
    - Add a new rusage_ext structure to store raw time stats such as tick counts
      for user, system, and interrupt time as well as a bintime of the total
      runtime. A new p_rux field in struct proc replaces the same inline fields
      from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
      field in struct proc contains the "raw" child time usage statistics.
      ruadd() has been changed to handle adding the associated rusage_ext
      structures as well as the values in rusage. Effectively, the values in
      rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
      two fields in struct rusage are no longer used in the kernel.
    - calcru() has been split into a static worker function calcru1() that
      calculates appropriate timevals for user and system time as well as updating
      the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
      copy of the process' p_rux structure to compute the timevals after updating
      the runtime appropriately if any of the threads in that process are
      currently executing. This also includes an additional fix so that calcru()
      now correctly handles threads from the process that are executing on other
      CPUs. Also, the calcru() now only locks sched_lock internally while doing
      the rux_runtime fixup. It now only requires the caller to hold the proc
      lock and calcru1() only requires the proc lock internally. calcru() also no
      longer allows callers to ask for an interrupt timeval since none of them
      actually did.
    - A new calccru() function computes the child system and user timevals by
      calling calcru1() on p_crux. Note that this means that any code that wants
      child times must now call this function rather than reading from p_cru
      directly. This function also requires the proc lock.
    - This finishes the locking for rusage and friends so some of the Giant locks
      in exit1() and kern_wait() are now gone.

    As a side effect of storing the raw values, the accuracy of the process timing
    has been approved. This makes benchmarking somewhat tricky as the appearance
    is that with this patch user times go way up but system times go way down.
    Thus, the only benchmarks I did were to compare real times and to also
    compare the sum of the user and system times to the real times. Thus, here
    are the results on a kernel w/o debugging (when WITNESS + INVARIANTS were on,
    the extra overhead resulted in no statistical difference in the before and
    after). For real times (100 runs of 10000 fork/wait loops):

    x smpng.fast.real
    + proc.fast.real
    +--------------------------------------------------------------------------+
    | + |
    | + |
    | + + |
    | + + |
    | + + |
    | + + |
    | + + |
    | + + x x |
    | + + x x |
    | + + x x |
    | + + x x |
    | + + x x x |
    | + + x x x |
    | + + x x x |
    | + + x x x |
    | + + x x x x |
    | + + + x x x x |
    | + + + x x x x |
    | + + + x x x x |
    | + + + x x x x |
    | + + + x x x x x |
    | + + + x x x x x |
    | + + + + x x x x x |
    | + + + + x x x x x |
    | + + + + x x x x x x |
    | + + + + + * x x x x x |
    | + + + + + + * x x x x x |
    | + + + + + + * x x x x x |
    | + + + + + + + * * x x x x x |
    |+ + + + + + + + * * * x x x x x x|
    | |___M__A_____| |____M_A______| |
    +--------------------------------------------------------------------------+
        N Min Max Median Avg Stddev
    x 100 2.97 3.08 2.99 2.9959 0.018968075
    + 100 2.88 2.99 2.93 2.9362 0.017568337
    Difference at 95.0% confidence
            -0.0597 +/- 0.0050674
            -1.99272% +/- 0.169145%
            (Student's t, pooled s = 0.0182816)

    So, close to about a 2% improvement. As far as accuracy "improvements", the
    numbers comparing sum of user + sys compared to "real" time is:

    x smpng.fast.real
    + smpng.fast.total
        N Min Max Median Avg Stddev
    x 100 2.97 3.08 2.99 2.9959 0.018968075
    + 100 2.83 2.93 2.86 2.8601 0.016111668
    Difference at 95.0% confidence
            -0.1358 +/- 0.0048779
            -4.53286% +/- 0.162819%
            (Student's t, pooled s = 0.0175979)

    And for the kernel with the patch:

    x proc.fast.real
    + proc.fast.total
        N Min Max Median Avg Stddev
    x 100 2.88 2.99 2.93 2.9362 0.017568337
    + 100 2.85 2.96 2.92 2.9201 0.017551943
    Difference at 95.0% confidence
            -0.0161 +/- 0.00486742
            -0.548328% +/- 0.165773%
            (Student's t, pooled s = 0.0175601)

    Thus, the total counts are closer to the real times with the patch than
    without the patch. Given that these results were repeated numerous times
    with different benchmarks on an idle box in the same state I feel that these
    differences indicate an improvement in the accuracy of the accounting.

    The patch is at http://www.FreeBSD.org/~jhb/patches/rusage_ext.patch and is
    largely based on a patch originally submitted by bde@.

    -- 
    John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
    "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
    

  • Next message: John Baldwin: "Re: [PATCH] Rework how we store process times in the kernel and deferring calcru()"