Re: New libc malloc patch

From: Jon Dama (jd_at_ugcs.caltech.edu)
Date: 11/29/05

  • Next message: Mark Atkinson: "LOR: so_snd @ /usr/src/sys/kern/uipc_coket.c:780"
    Date: Tue, 29 Nov 2005 12:06:39 -0800 (PST)
    To: Jason Evans <jasone@canonware.com>
    
    

    I have a rather strong objection to make to this proposal (read: if this
    change goes in I'm going to have to go through the effort of ripping it
    out locally...):

    There exists a problem right now--localized to i386 and any other arch
    based on 32-bit pointers: address space is simply too scarce.

    Your decision to switch to using mmap as the exclusive source of malloc
    buckets is admirable for its modernity but it simply cannot stand unless
    someone steps up to change the way mmap and brk interact within the
    kernel.

    The trouble arises from the need to set MAXDSIZ and the resulting effect
    it has in determining the start of the mmap region--which I might add is
    the location that the shared library loader is placed. This effectively
    (and explicitly) sets the limit for how large of a contiguous region can
    be allocated with brk.

    What you've done by switching the system malloc to exclusively using
    mmap is induced a lot of motivation on the part of the sysadmin to push
    that brk/mmap boundary down.

    This wouldn't be a problem except that you've effectively shot in the foot
    dozens of alternative c malloc implementations, not to mention the memory
    allocator routines used in obscure languages such as Modula-3 and Haskell
    that rely on brk derived buckets.

    This isn't playing very nicely!

    I looked into the issues and limitations with phkmalloc several months ago
    and concluded that simply adopting ptmalloc2 (the linux malloc) was the
    better approach--notably it is willing to draw from both brk and mmap, and
    it also implements per-thread arenas.

    There is also cause for concern about your "cache-line" business. Simply
    on the face of it there is the problem that the scheduler does not do a
    good job of pinning threads to individual CPUs. The threads are already
    bounding from cpu to cpu and thrashing (really thrashing) each CPU cache
    along the way.

    Second, you've forgotten that there is a layer of indirection between your
    address space and the cache: the mapping of logical pages (what you can
    see in userspace) to physical pages (the addresses of which actually
    matter for the purposes of the cache). I don't recall off-hand whether or
    not the L1 cache on i386 is based on tags of the virtual addresses, but I
    am certain that the L2 and L3 caches tag the physical addresses not the
    virtual addresses.

    This means that your careful address selection based on cache-lines will
    only work out if it is done in the vm codepath: remember the mapping of
    physical addresses to the virtual addresses that come back from mmap can
    be delayed arbitrarily long into the future depending on when the program
    actually goes to touch that memory.

    Furthermore, the answer may vary depending on the architecture or even the
    processor version.

    -Jon

    On Mon, 28 Nov 2005, Jason Evans wrote:

    > There is a patch that contains a new libc malloc implementation at:
    >
    > http://www.canonware.com/~jasone/jemalloc/jemalloc_20051127a.diff
    >
    > This implementation is very different from the current libc malloc.
    > Probably the most important difference is that this one is designed
    > with threads and SMP in mind.
    >
    > The patch has been tested for stability quite a bit already, thanks
    > mainly to Kris Kennaway. However, any help with performance testing
    > would be greatly appreciated. Specifically, I'd like to know how
    > well this malloc holds up to threaded workloads on SMP systems. If
    > you have an application that relies on threads, please let me know
    > how performance is affected.
    >
    > Naturally, if you notice horrible performance or ridiculous resident
    > memory usage, that's a bad thing and I'd like to hear about it.
    >
    > Thanks,
    > Jason
    >
    > === Important notes:
    >
    > * You need to do a full buildworld/installworld in order for the
    > patch to work correctly, due to various integration issues with the
    > threads libraries and rtld.
    >
    > * The virtual memory size of processes, as reported in the SIZE field
    > by top, will appear astronomical for almost all processes (32+ MB).
    > This is expected; it is merely an artifact of using large mmap()ed
    > regions rather than sbrk().
    >
    > * In keeping with the default option settings for CURRENT, the A and
    > J flags are enabled by default. When conducting performance tests,
    > specify MALLOC_OPTIONS="aj" .
    >
    > _______________________________________________
    > freebsd-current@freebsd.org mailing list
    > http://lists.freebsd.org/mailman/listinfo/freebsd-current
    > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
    >
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Mark Atkinson: "LOR: so_snd @ /usr/src/sys/kern/uipc_coket.c:780"