Memory allocation performance/statistics patches
From: Robert Watson (rwatson_at_FreeBSD.org)
Date: 04/17/05
- Previous message: Joseph Koshy: "New snapshot of the CPU performance monitoring counter work"
- Next in thread: Robert Watson: "Re: Memory allocation performance/statistics patches"
- Reply: Robert Watson: "Re: Memory allocation performance/statistics patches"
- Reply: Robert Watson: "Re: Memory allocation performance/statistics patches"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 17 Apr 2005 15:31:50 +0100 (BST) To: performance@FreeBSD.org
Attached please find three patches:
(1) uma.diff, which modifies the UMA slab allocator to use critical
sections instead of mutexes to protect per-CPU caches.
(2) malloc.diff, which modifies the malloc memory allocator to use
critical sections and per-CPU data instead of mutexes to store
per-malloc-type statistics, coalescing for the purposes of the sysctl
used to generate vmstat -m output.
(3) mbuf.diff, which modifies the mbuf allocator to use per-CPU data and
critical sections for statistics, instead of synchronization-free
statistics which could result in substantial inconsistency on SMP
systems.
These changes are facilitated by John Baldwin's recent re-introduction of
critical section optimizations that permit critical sections to be
implemented "in software", rather than using the hardware interrupt
disable mechanism, which is quite expensive on modern processors
(especially Xeon P4 CPUs). While not identical, this is similar to the
softspl behavior in 4.x, and Linux's preemption disable mechanisms (and
various other post-Vax systems :-)).
The reason this is interesting is that it allows synchronization of
per-CPU data to be performed at a much lower cost than previously, and
consistently across UP and SMP systems. Prior to these changes, the use
of critical sections and per-CPU data as an alternative to mutexes would
lead to an improvement on SMP, but not on UP. So, that said, here's what
I'd like us to look at:
- Patches (1) and (2) are intended to improve performance by reducing the
overhead of maintaining cache consistency and statistics for UMA and
malloc(9), and may universally impact performance (in a small way) due
to the breadth of their use through the kernel.
- Patch (3) is intended to restore consistency to statistics in the
presence of SMP and preemption, at the possible cost of some
performance.
I'd like to confirm that for the first two patches, for interesting
workloads, performance generally improves, and that stability doesn't
degrade. For the third partch, I'd like to quantify the cost of the
changes for interesting workloads, and likewise confirm no loss of
stability.
Because these will have a relatively small impact, a fair amount of
caution is required in testing. We may be talking about a percent or two,
maybe four, difference in benchmark performance, and many benchmarks have
a higher variance than that.
A couple of observations for those interested:
- The INVARIANTS panic with UMA seen in some earlier patch versions is
believed to be corrected.
- Right now, because I use arrays of foo[MAXCPUS], I'm concerned that
different CPUs will be writing to the same cache line as they're
adjacent in memory. Moving to per-CPU chunks of memory to hold this
stuff is desirable, but I think first we need to identify a model by
which to do that cleanly. I'm not currently enamored of the 'struct
pcpu' model, since it makes us very sensitive to ABI changes, as well as
not offering a model by which modules can register new per-cpu data
cleanly. I'm also inconsistent about how I dereference into the arrays,
and intend to move to using 'curcpu' throughout.
- Because mutexes are no longer used in UMA, and not for the others
either, stats read across different CPUs that are coalesced may be
slightly inconsistent. I'm not all that concerned about it, but it's
worth thinking on.
- Malloc stats for realloc() are still broken if you apply this patch.
- High watermarks are no longer maintained for malloc since they require a
global notion of "high" that is tracked continuously (i.e., at each
change), and there's no longer a global view except when the observer
kicks in (sysctl). You can imagine various models to restore some
notion of a high watermark, but I'm not currently sure which is the
best. The high watermark notion is desirable though.
So this is a request for:
(1) Stability testing of these patches. Put them on a machine, make them
hurt. If things go South, try applying the patches one by one until
it's clear which is the source.
(2) Performance testing of these patches. Subject to the challenges in
testing them. If you are interested, please test each patch
separately to evaluate its impact on your system. Then apply all
together and see how it evens out. You may find that the mbuf
allocator patch outweighs the benefits of the other two patches, if
so, that is interesting and something to work on!
I've done some micro-benchmarking using tools like netblast,
syscall_timing, etc, but I'm interested particularly in the impact on
macrobenchmarks.
Thanks!
Robert N M Watson
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org"
- TEXT/PLAIN attachment: uma.diff
- TEXT/PLAIN attachment: mbuf.diff
- TEXT/PLAIN attachment: malloc.diff
- Previous message: Joseph Koshy: "New snapshot of the CPU performance monitoring counter work"
- Next in thread: Robert Watson: "Re: Memory allocation performance/statistics patches"
- Reply: Robert Watson: "Re: Memory allocation performance/statistics patches"
- Reply: Robert Watson: "Re: Memory allocation performance/statistics patches"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|