Re: SMP problem with uma_zalloc

From: Harti Brandt (brandt_at_fokus.fraunhofer.de)
Date: 07/21/03

  • Next message: Bosko Milekic: "Re: SMP problem with uma_zalloc"
    Date: Mon, 21 Jul 2003 15:47:54 +0200 (CEST)
    To: Bosko Milekic <bmilekic@technokratis.com>
    
    

    On Mon, 21 Jul 2003, Bosko Milekic wrote:

    BM>
    BM>On Mon, Jul 21, 2003 at 09:03:00AM +0200, Harti Brandt wrote:
    BM>> On Sat, 19 Jul 2003, Bosko Milekic wrote:
    BM>>
    BM>> BM>
    BM>> BM>On Sat, Jul 19, 2003 at 08:31:26PM +0200, Lara & Harti Brandt wrote:
    BM>> BM>[...]
    BM>> BM>> Well the problem is, that nothing is starved. I have an idle machine and
    BM>> BM>> a zone that I have limited to 60 or so items. When allocating the 2nd
    BM>> BM>> item I get block on the zone limit. Usually I get unblocked whenever I
    BM>> BM>> free an item. This will however not happen, because I have neither
    BM>> BM>> reached the limit nor is there memory pressure in the system to which I
    BM>> BM>> could react. I simply may be blocked forever.
    BM>> BM>
    BM>> BM> UMA_ZFLAG_FULL is set on the zone prior to the msleep(). This means
    BM>> BM> that the next free will result in your wakeup, as the next free will
    BM>> BM> be sent to the zone internally, and not the pcpu cache.
    BM>>
    BM>> But there is no free to come. To explain where we have the problem:
    BM>>
    BM>> the HARP ATM code uses a zone in the IP code to allocate control blocks
    BM>> for VCCs. The zone is limited to 100 items which evaluates to 1 page.
    BM>> When I start an interface, first the signalling vcc=5 is opened. This
    BM>> allocates one item from the zone, all the other items go into the CPU
    BM>> cache. Next I start ILMI. ILMI tries to open its vcc=16. While this works
    BM>> on UP machines (the zone allocator will find a free item in the CPU
    BM>> cache), on my 2-proc machine half of the time ILMI gets blocked on the
    BM>> zonelimit. And it blocks there forever, because, of course nobody is going
    BM>> to free the one and only allocated item. On a four processor machine the
    BM>> blocking probability will be 75%.
    BM>>
    BM>> So in order to be able to get out N items from a zone (given that there is
    BM>> no shortage of memory) one has to set the limit to N + nproc *
    BM>> items_per_allocation, which one cannot do because he doesn't know
    BM>> items_per_allocation.
    BM>
    BM> It sounds to me like your example is really not the general-case one.
    BM> Basically, you're using a zone capped off at 1 page. Currently in
    BM> UMA, this is the size of the slab. So, basically, you have this whole
    BM> zone (with all associated overhead) so as to serve a maximum of only
    BM> one slab. This defeats most of the assumptions made when the zone is
    BM> created with PCPU caches. The zone maximum exists to prevent more
    BM> than the specified amount of resources to be allocated toward the
    BM> given zone; I don't think that the intention was "to ensure that if
    BM> the maximum items aren't allocated, there will always be one
    BM> available," despite the fact that that is the effective behavior on
    BM> UP.
    BM>
    BM> The solution to your really small zone problem is to either make the
    BM> zone bigger, or to hack at UMA to export the UMA_ZONE_INTERNAL API
    BM> properly so that you can skip the pcpu caches for all allocations and
    BM> go straight to the zone. I'd suggest that you make the zone bigger,
    BM> unless there's a Really Good reason not to.

    I think I take two paths: for stuffs like VCC where there may be a large
    number I will just remove the limit. The limits were a leftover when the
    ATM code had its own memory pool code. For stuff where there is a high
    probability that only a handful (usually 1 or 2) of them will be allocated
    (network interfaces) I will try to make it to use malloc().

    How do you think about adding a paragraph for uma_zone_set_max to the man
    page?:

    An upper limit of items in the zone can be specified with a call to
    uma_zone_set_max. This limits the total number of items which includes:
    allocated items, free items and free items in the per-cpu caches. On
    systems with more than one CPU it may not be possible to allocate the
    specified number of items, because all of the remaining free items may
    be in the caches of the other CPUs when the limit is hit.

    Regards,
    harti

     BM>
    BM> In mb_alloc (for mbufs) I had implemented something that in this sort
    BM> of scenario would dip into the other caches and transfer over what I
    BM> called a "bucket" to the current cpu cache. Although in this
    BM> scenario, it seems like that sort of solution would do what you want,
    BM> some more thought into its behavior reveals that in fact it pessimizes
    BM> the situation. To give you a better idea, let's consider what happens
    BM> in this specific scenario, where a "bucket" would be all of a page.
    BM> The allocator would make an attempt to allocate from its pcpu cache
    BM> but would find it empty, so it would then attempt to steal a bucket
    BM> from the second cpu's cache. There, it would find the bucket, move it
    BM> to its cpu's cache, and grab an item from it. However, a thread on
    BM> the second cpu may then attempt to grab an item, and the bucket will
    BM> just ping-pong from pcpu cache to pcpu cache; the problem that the
    BM> allocator was trying to solve for such really small zones was in fact
    BM> still there - because of the general assumptions made in the design
    BM> with respect to the size of most zones that it dealt with - only
    BM> instead of failing the allocation, it was pessimizing it.
    BM>
    BM>> harti
    BM>
    BM>Regards,
    BM>

    -- 
    harti brandt,
    http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private
    brandt@fokus.fraunhofer.de, harti@freebsd.org
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
    

  • Next message: Bosko Milekic: "Re: SMP problem with uma_zalloc"

    Relevant Pages

    • [this_cpu_xx V4 13/20] this_cpu_ops: page allocator conversion
      ... Use the per cpu allocator functionality to avoid per cpu arrays in struct zone. ... Another effect is that the pagesets of one processor are placed near one ... allocate or free individual pagesets. ...
      (Linux-Kernel)
    • Re: FreeBSD 5.3 Bridge performance take II
      ... :to reduce the cost of zone allocation without making modifications to our ... such as high level zone management, can be done passively (in DragonFly's ... to the cpu whos cache it needs to operate on, does its stuff, then ... migrates to the next cpu, or by any number of other clever mechanisms ...
      (freebsd-current)
    • Re: GIM54502E Where DDNAME in message text does not match whats
      ... Rejecting the USERMOD from the global zone does not cleanup the ... The OP said originally that he had misspelled the DDNAME on the ... it could simply fail. ... allocate the data set before updating the SYSLIB subentry? ...
      (bit.listserv.ibm-main)
    • Re: [linux-pm] [PATCH] hibernation should work ok with memory hotplug
      ... It only appears in arm an avr32. ... for how we allocate and do lookups into the mem_map. ... per zone bitmaps and therefore ... index within the zone and avoid the same wastage that ARCH_PFN_OFFSET ...
      (Linux-Kernel)
    • Re: [Bisected Regression in 2.6.35] A full tmpfs filesystem causeshibernation to hang
      ... failure because already allocated enough lower zone memory ... In all of the tests I carried out the requested 50% of highmem ... had been allocated before allocations from the normal zone started to be ... if there is a sufficient number of non-highmem pages to allocate from ...
      (Linux-Kernel)