Memory allocation performance/statistics patches

From: Robert Watson (rwatson_at_FreeBSD.org)
Date: 04/17/05

  • Next message: Robert Watson: "Re: Memory allocation performance/statistics patches"
    Date: Sun, 17 Apr 2005 15:31:50 +0100 (BST)
    To: performance@FreeBSD.org
    
    
    

    Attached please find three patches:

    (1) uma.diff, which modifies the UMA slab allocator to use critical
         sections instead of mutexes to protect per-CPU caches.

    (2) malloc.diff, which modifies the malloc memory allocator to use
         critical sections and per-CPU data instead of mutexes to store
         per-malloc-type statistics, coalescing for the purposes of the sysctl
         used to generate vmstat -m output.

    (3) mbuf.diff, which modifies the mbuf allocator to use per-CPU data and
         critical sections for statistics, instead of synchronization-free
         statistics which could result in substantial inconsistency on SMP
         systems.

    These changes are facilitated by John Baldwin's recent re-introduction of
    critical section optimizations that permit critical sections to be
    implemented "in software", rather than using the hardware interrupt
    disable mechanism, which is quite expensive on modern processors
    (especially Xeon P4 CPUs). While not identical, this is similar to the
    softspl behavior in 4.x, and Linux's preemption disable mechanisms (and
    various other post-Vax systems :-)).

    The reason this is interesting is that it allows synchronization of
    per-CPU data to be performed at a much lower cost than previously, and
    consistently across UP and SMP systems. Prior to these changes, the use
    of critical sections and per-CPU data as an alternative to mutexes would
    lead to an improvement on SMP, but not on UP. So, that said, here's what
    I'd like us to look at:

    - Patches (1) and (2) are intended to improve performance by reducing the
       overhead of maintaining cache consistency and statistics for UMA and
       malloc(9), and may universally impact performance (in a small way) due
       to the breadth of their use through the kernel.

    - Patch (3) is intended to restore consistency to statistics in the
       presence of SMP and preemption, at the possible cost of some
       performance.

    I'd like to confirm that for the first two patches, for interesting
    workloads, performance generally improves, and that stability doesn't
    degrade. For the third partch, I'd like to quantify the cost of the
    changes for interesting workloads, and likewise confirm no loss of
    stability.

    Because these will have a relatively small impact, a fair amount of
    caution is required in testing. We may be talking about a percent or two,
    maybe four, difference in benchmark performance, and many benchmarks have
    a higher variance than that.

    A couple of observations for those interested:

    - The INVARIANTS panic with UMA seen in some earlier patch versions is
       believed to be corrected.

    - Right now, because I use arrays of foo[MAXCPUS], I'm concerned that
       different CPUs will be writing to the same cache line as they're
       adjacent in memory. Moving to per-CPU chunks of memory to hold this
       stuff is desirable, but I think first we need to identify a model by
       which to do that cleanly. I'm not currently enamored of the 'struct
       pcpu' model, since it makes us very sensitive to ABI changes, as well as
       not offering a model by which modules can register new per-cpu data
       cleanly. I'm also inconsistent about how I dereference into the arrays,
       and intend to move to using 'curcpu' throughout.

    - Because mutexes are no longer used in UMA, and not for the others
       either, stats read across different CPUs that are coalesced may be
       slightly inconsistent. I'm not all that concerned about it, but it's
       worth thinking on.

    - Malloc stats for realloc() are still broken if you apply this patch.

    - High watermarks are no longer maintained for malloc since they require a
       global notion of "high" that is tracked continuously (i.e., at each
       change), and there's no longer a global view except when the observer
       kicks in (sysctl). You can imagine various models to restore some
       notion of a high watermark, but I'm not currently sure which is the
       best. The high watermark notion is desirable though.

    So this is a request for:

    (1) Stability testing of these patches. Put them on a machine, make them
         hurt. If things go South, try applying the patches one by one until
         it's clear which is the source.

    (2) Performance testing of these patches. Subject to the challenges in
         testing them. If you are interested, please test each patch
         separately to evaluate its impact on your system. Then apply all
         together and see how it evens out. You may find that the mbuf
         allocator patch outweighs the benefits of the other two patches, if
         so, that is interesting and something to work on!

    I've done some micro-benchmarking using tools like netblast,
    syscall_timing, etc, but I'm interested particularly in the impact on
    macrobenchmarks.

    Thanks!

    Robert N M Watson

    
    
    
    
    

    _______________________________________________
    freebsd-performance@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-performance
    To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org"





  • Next message: Robert Watson: "Re: Memory allocation performance/statistics patches"

    Relevant Pages

    • Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19
      ... Without these patches, it was almost impossible ... > Despite what people were trying to tell me at Ottawa, this patch ... > allocator, and it seems to be increasingly only of benefit to ... > dynamically allocating hugepages and memory hot unplug. ...
      (Linux-Kernel)
    • Re: [Lhms-devel] [PATCH 0/9] fragmentation avoidance
      ... I believe the patches are now ready for inclusion in -mm, ... > wider testing inclusion in the mainline kernel. ... > Patch 1 defines the allocation flags and adds them to the allocator calls. ...
      (Linux-Kernel)
    • 9_Recommended error codes (specifically return code 5)
      ... * "return code 2" indicates patches are already installed. ... * "return code 25" means a patches requires another patch that is not yet installed. ... With or without using the save option, the patch installation process ... Installing 114008-01... ...
      (SunManagers)
    • Re: This is [Re:] How to improve the quality of the kernel[?].
      ... The -mm kernel already implements what your proposed PTS would do. ... If patch have no TS ID, ... Thus i can apply for example lguest patches and implement and test new ... How many open source projects use Bugzilla and how many use the Debian BTS? ...
      (Linux-Kernel)
    • Re: ATTACK of the WEEK-fentanyl patches
      ... FDA warns of deaths from fentanyl patch ... Some of the deaths came after doctors prescribed the patches to the ... The drug is only for chronic pain in people used to narcotics, ...
      (alt.support.chronic-pain)