Re: ten thousand small processes

From: Marcel Moolenaar (marcel_at_xcllnt.net)
Date: 06/26/03

  • Next message: Bakul Shah: "Re: ten thousand small processes"
    Date: Wed, 25 Jun 2003 21:38:30 -0700
    To: "D. J. Bernstein" <djb@cr.yp.to>
    
    

    On Thu, Jun 26, 2003 at 02:50:29AM -0000, D. J. Bernstein wrote:
    > Jon Mini writes:
    > > I'm sorry, but you are way off here. First of all, caches are *much
    > > larger* than the size of the processes you are talking about.
    >
    > I'm sorry, but you are being misled by a naive model of CPU performance.
    > On a typical Pentium in our department, the following program becomes
    > three times faster when SPACING is changed from 4096 to 128:
    *snip*
    > >From an asm programmer's perspective, when FreeBSD decides to spread a
    > small program's variables between
    >
    > * the beginning of a data page,
    > * the beginning of a bss page,
    > * the beginning of a malloc mmap page,
    > * the beginning of a heap page,
    > * the beginning of the next heap page,
    > * the beginning of yet another heap page,
    >
    > et cetera, it is actively trying (with varying degrees of success) to
    > damage cache performance in exactly the same way that this program does.

    Just curious: do you happen to know if the performance hit is caused
    by the second order effect of having the spacing be a multiple of
    the cache associativity, thereby resulting in thrashing of a few
    cache lines, and that compacting the code results in a more uniform
    cache placement?
    In other words: is it (sec) the spacing that counts or the interaction
    of a particular "distance" with cache placement?

    -- 
     Marcel Moolenaar	  USPA: A-39004		 marcel@xcllnt.net
    _______________________________________________
    freebsd-performance@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-performance
    To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org"
    

  • Next message: Bakul Shah: "Re: ten thousand small processes"

    Relevant Pages

    • Re: atomic-free allocator
      ... Local free operations use simple stack. ... I can find in this stack some nodes twice, ... Local malloc to a completely exhausted local heap uses Atomic SWAP. ... Remote free to a cache enabled heap has single-thread ...
      (comp.programming.threads)
    • Re: Accessing SDRAM of DSK6711
      ... then invalidate the cache before using it. ... Make sure your MEM configuration includes a segment for internal RAM ... Make sure that there's a heap defined in that segment ... use MEM_alloc to allocate a buffer from the internal ...
      (comp.dsp)
    • Using SoftReferences for caching
      ... I'm writing an application where I have a memory cache of objects ... object can be as large as the heap. ... using SoftReferences seems like a good solution. ...
      (comp.lang.java.programmer)