Download raw body.
pools: limit the number of items that can be kept in per cpu cache lists
pools: limit the number of items that can be kept in per cpu cache lists
On Thu, Jan 29, 2026 at 11:42:06AM +1000, David Gwynne wrote:
> recent events at work have made me conclude that pools are too greedy,
> and they end up holding onto free items for a lot longer than they
> should.
>
> pools are deliberately conservative about returning memory to the
> backend page allocators because it's assumed that if the system just
> used a certain number of these items, it's likely to do the same thing
> again in the future. so if you allocate a ton of memory out of a pool,
> the pool will end up holding onto that memory for a while rather than
> give it straight back to the backend page allocator.
>
> this is made worse when you enable per cpu caches on a pool. this
> effectively adds another 2 layers of free item caching, one of which has
> a feature that allows for unlimited growth of the free list size.
>
> this is the first of a bunch of little steps to try and mitigate these
> problems.
>
> the per cpu caches in pools have a feature where if they detect
> contention on the global pool, they'll grow the number of items they'll
> keep in the per cpu cache to mitigate against that contention in the
> future. the default and minimum list length is 8 items, but there's
> currently 's no limit to how long those lists can end up as.
>
> this diff limits the growth of these lists to roughly 64 (71) items. the
> important part of the change is adding the machinery to enforce the
> limit, i'm happy to fiddle with 64 in the future. i have a bunch of
> other changes in this space i want to get in before we do that tuning
> though.
>
> ok?
OK claudio@.
One comment below (which is unrelated to the diff).
> Index: subr_pool.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/subr_pool.c,v
> diff -u -p -r1.243 subr_pool.c
> --- subr_pool.c 29 Jan 2026 01:04:35 -0000 1.243
> +++ subr_pool.c 29 Jan 2026 01:14:17 -0000
> @@ -155,6 +155,7 @@ struct pool_page_header {
>
> #ifdef MULTIPROCESSOR
> #define POOL_CACHE_LIST_MIN 8 /* minimum list length */
> +#define POOL_CACHE_LIST_MAX 64
> #define POOL_CACHE_LIST_INC 8
> #define POOL_CACHE_LIST_DEC 1
>
> @@ -2046,13 +2047,13 @@ pool_cache_gc(struct pool *pp)
>
> contention = pp->pr_cache_contention;
> delta = contention - pp->pr_cache_contention_prev;
> - if (delta > 8 /* magic */) {
> - if ((ncpusfound * POOL_CACHE_LIST_MIN * 2) <=
> - pp->pr_cache_nitems)
> - pp->pr_cache_items += POOL_CACHE_LIST_INC;
> - } else if (delta == 0) {
> - if (pp->pr_cache_items > POOL_CACHE_LIST_MIN)
> - pp->pr_cache_items -= POOL_CACHE_LIST_DEC;
> + if (delta > 8 /* magic */ &&
> + pp->pr_cache_items < POOL_CACHE_LIST_MAX &&
> + (ncpusfound * POOL_CACHE_LIST_MIN * 2) <= pp->pr_cache_nitems) {
I'm worried about
(ncpusfound * POOL_CACHE_LIST_MIN * 2) <= pp->pr_cache_nitems
I understand that you don't want to increase the pressure if the cache has
little churn. My worry is that systems with many cpus are more prone to
have mtx contention but this check makes it harder for such systems to
scale up. At least this is how I see the interaction here.
Maybe you can shed some light on this based on your experience.
> + pp->pr_cache_items += POOL_CACHE_LIST_INC;
> + } else if (delta == 0 &&
> + pp->pr_cache_items > POOL_CACHE_LIST_MIN) {
> + pp->pr_cache_items -= POOL_CACHE_LIST_DEC;
> }
> pp->pr_cache_contention_prev = contention;
> }
>
>
--
:wq Claudio
pools: limit the number of items that can be kept in per cpu cache lists
pools: limit the number of items that can be kept in per cpu cache lists