Re: New libc malloc patch



On Sun, Dec 11, 2005 at 08:29:07PM -0500, Kris Kennaway wrote:

> I'll try to test this on a 4 CPU amd64 machine next.

phkmalloc:

# ./malloc-test 1024 10000000 1
Starting test with 1 thread...
Thread 5298176 adjusted timing: 4.173052 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 2
Starting test with 2 threads...
Thread 5299200 adjusted timing: 325.108643 seconds for 10000000 requests of 1024 bytes.
Thread 5298176 adjusted timing: 325.202485 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 3
Starting test with 3 threads...
Thread 5414912 adjusted timing: 1133.238459 seconds for 10000000 requests of 1024 bytes.
Thread 5299200 adjusted timing: 1134.525255 seconds for 10000000 requests of 1024 bytes.
Thread 5298176 adjusted timing: 1134.539555 seconds for 10000000 requests of 1024 bytes.

jemalloc:

# ./malloc-test 1024 10000000 1
Starting test with 1 thread...
Thread 1073760528 adjusted timing: 3.777175 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 2
Starting test with 2 threads...
Thread 1073760560 adjusted timing: 3.851702 seconds for 10000000 requests of 1024 bytes.
Thread 1073761584 adjusted timing: 3.887943 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 3
Starting test with 3 threads...
Thread 1073760528 adjusted timing: 3.866206 seconds for 10000000 requests of 1024 bytes.
Thread 1073761552 adjusted timing: 13.382795 seconds for 10000000 requests of 1024 bytes.
Thread 1073762688 adjusted timing: 14.407229 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 4
Starting test with 4 threads...
Thread 1073760528 adjusted timing: 3.782923 seconds for 10000000 requests of 1024 bytes.
Thread 1073763792 adjusted timing: 6.668655 seconds for 10000000 requests of 1024 bytes.
Thread 1073762688 adjusted timing: 14.346569 seconds for 10000000 requests of 1024 bytes.
Thread 1073761584 adjusted timing: 14.680211 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 5
Starting test with 5 threads...
Thread 1073760560 adjusted timing: 4.748248 seconds for 10000000 requests of 1024 bytes.
Thread 1073761584 adjusted timing: 9.898153 seconds for 10000000 requests of 1024 bytes.
Thread 1073764896 adjusted timing: 13.019884 seconds for 10000000 requests of 1024 bytes.
Thread 1073762688 adjusted timing: 15.326908 seconds for 10000000 requests of 1024 bytes.
Thread 1073763792 adjusted timing: 15.442164 seconds for 10000000 requests of 1024 bytes.

So it's 1.1 times faster for single-threaded, and 107 times faster
with 3 threads.

With libthr instead of libpthread:

phkmalloc:

# ./malloc-test 1024 10000000 1
Starting test with 1 thread...
Thread 5255680 adjusted timing: 2.357247 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 2
Starting test with 2 threads...
Thread 5256192 adjusted timing: 10.964918 seconds for 10000000 requests of 1024 bytes.
Thread 5255680 adjusted timing: 11.001288 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 3
Starting test with 3 threads...
Thread 5255680 adjusted timing: 17.467754 seconds for 10000000 requests of 1024 bytes.
Thread 5256704 adjusted timing: 17.724583 seconds for 10000000 requests of 1024 bytes.
Thread 5256192 adjusted timing: 17.913381 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 4
Starting test with 4 threads...
Thread 5255680 adjusted timing: 42.715420 seconds for 10000000 requests of 1024 bytes.
Thread 5256192 adjusted timing: 43.481252 seconds for 10000000 requests of 1024 bytes.
Thread 5256704 adjusted timing: 43.871452 seconds for 10000000 requests of 1024 bytes.
Thread 5257216 adjusted timing: 43.887820 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 5
Starting test with 5 threads...
Thread 5255680 adjusted timing: 139.316332 seconds for 10000000 requests of 1024 bytes.
Thread 5257216 adjusted timing: 140.117720 seconds for 10000000 requests of 1024 bytes.
Thread 5256192 adjusted timing: 140.134057 seconds for 10000000 requests of 1024 bytes.
Thread 5256704 adjusted timing: 140.855289 seconds for 10000000 requests of 1024 bytes.
Thread 5257728 adjusted timing: 140.865934 seconds for 10000000 requests of 1024 bytes.

jemalloc:

# ./malloc-test 1024 10000000 1
Starting test with 1 thread...
Thread 1073742416 adjusted timing: 1.366353 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 2
Starting test with 2 threads...
Thread 1073742416 adjusted timing: 1.429485 seconds for 10000000 requests of 1024 bytes.
Thread 1073742896 adjusted timing: 1.530733 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 3
Starting test with 3 threads...
Thread 1073742416 adjusted timing: 1.419813 seconds for 10000000 requests of 1024 bytes.
Thread 1073743376 adjusted timing: 1.432790 seconds for 10000000 requests of 1024 bytes.
Thread 1073742896 adjusted timing: 1.490218 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 4
Starting test with 4 threads...
Thread 1073743376 adjusted timing: 1.447554 seconds for 10000000 requests of 1024 bytes.
Thread 1073742416 adjusted timing: 1.503659 seconds for 10000000 requests of 1024 bytes.
Thread 1073743856 adjusted timing: 1.503937 seconds for 10000000 requests of 1024 bytes.
Thread 1073742896 adjusted timing: 1.504926 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 5
Starting test with 5 threads...
Thread 1073743376 adjusted timing: 1.595239 seconds for 10000000 requests of 1024 bytes.
Thread 1073742896 adjusted timing: 1.689753 seconds for 10000000 requests of 1024 bytes.
Thread 1073742416 adjusted timing: 1.750115 seconds for 10000000 requests of 1024 bytes.
Thread 1073744336 adjusted timing: 1.744271 seconds for 10000000 requests of 1024 bytes.
Thread 1073743856 adjusted timing: 1.890269 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 6
Starting test with 6 threads...
Thread 1073743856 adjusted timing: 1.847653 seconds for 10000000 requests of 1024 bytes.
Thread 1073742416 adjusted timing: 2.018481 seconds for 10000000 requests of 1024 bytes.
Thread 1073743376 adjusted timing: 2.059817 seconds for 10000000 requests of 1024 bytes.
Thread 1073742896 adjusted timing: 2.129204 seconds for 10000000 requests of 1024 bytes.
Thread 1073744336 adjusted timing: 2.223751 seconds for 10000000 requests of 1024 bytes.
Thread 1073744816 adjusted timing: 2.293809 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 20
Starting test with 20 threads...
Thread 1073744816 adjusted timing: 5.113769 seconds for 10000000 requests of 1024 bytes.
Thread 1073751136 adjusted timing: 4.973369 seconds for 10000000 requests of 1024 bytes.
Thread 1073750176 adjusted timing: 5.295912 seconds for 10000000 requests of 1024 bytes.
Thread 1073745296 adjusted timing: 5.502331 seconds for 10000000 requests of 1024 bytes.
Thread 1073743856 adjusted timing: 5.614890 seconds for 10000000 requests of 1024 bytes.
Thread 1073744336 adjusted timing: 5.608690 seconds for 10000000 requests of 1024 bytes.
Thread 1073752096 adjusted timing: 5.555465 seconds for 10000000 requests of 1024 bytes.
Thread 1073748736 adjusted timing: 5.650922 seconds for 10000000 requests of 1024 bytes.
Thread 1073748256 adjusted timing: 6.608054 seconds for 10000000 requests of 1024 bytes.
Thread 1073750656 adjusted timing: 7.144998 seconds for 10000000 requests of 1024 bytes.
Thread 1073742896 adjusted timing: 7.390905 seconds for 10000000 requests of 1024 bytes.
Thread 1073746256 adjusted timing: 7.364728 seconds for 10000000 requests of 1024 bytes.
Thread 1073742416 adjusted timing: 7.556064 seconds for 10000000 requests of 1024 bytes.
Thread 1073749216 adjusted timing: 7.357179 seconds for 10000000 requests of 1024 bytes.
Thread 1073752576 adjusted timing: 7.349483 seconds for 10000000 requests of 1024 bytes.
c Thread 1073747776 adjusted timing: 7.375179 seconds for 10000000 requests of 1024 bytes.
Thread 1073751616 adjusted timing: 7.557854 seconds for 10000000 requests of 1024 bytes.
Thread 1073743376 adjusted timing: 7.915978 seconds for 10000000 requests of 1024 bytes.
Thread 1073749696 adjusted timing: 7.795219 seconds for 10000000 requests of 1024 bytes.
Thread 1073745776 adjusted timing: 8.007392 seconds for 10000000 requests of 1024 bytes.

So libthr is *much* faster than libpthread with both malloc
implementations, but jemalloc is still 1.7 times faster for 1 thread
and 80 times faster for 5 threads than phkmalloc.

Kris

P.S. Holy crap :)

Attachment:pgpgQmZUY8UFR.pgp
Description: PGP signature



Relevant Pages

  • Re: New malloc ready, take 42
    ... working on as a replacement for the current libc malloc. ... phkmalloc doesn't fare very well with its default settings since the benchmark's memory usage fluctuates enough to cause phkmalloc to repeatedly allocate and free pages. ... If phkmalloc's cache size is increased adequately, it beats jemalloc. ... jemalloc simply has to do more work when splitting and coalescing regions than phkmalloc does, and this benchmark severely stresses that aspect of jemalloc. ...
    (freebsd-current)
  • Multiple malloc(3)s (was Re: HEADS DOWN)
    ... that I do wish we'd retained an easy switch to let us go back and forth between phkmalloc and jemalloc for comparison purposes. ... It is pretty easy to go back and forth for comparison purposes, because revision 1.92 of src/lib/libc/stdlib/malloc.c is a source-compatible version of phkmalloc. ... You can update just that one file and get a libc with phkmalloc instead of jemalloc. ...
    (freebsd-arch)
  • Re: Multiple malloc(3)s (was Re: HEADS DOWN)
    ... that I do wish we'd retained an easy switch to let us go back ... and forth between phkmalloc and jemalloc for comparison purposes. ... It is pretty easy to go back and forth for comparison purposes, ...
    (freebsd-arch)
  • Re: Looks like threading is b0rken on FreeBSD/powerpc
    ... jemalloc. ... From the backtrace, t's doing a TLS allocation, where I'm certain phkmalloc didn't do anything like that. ...
    (freebsd-current)