Download raw body.
wake up mbuf pools when pages get released
On Thu, Jan 29, 2026 at 10:53:17AM +1000, David Gwynne wrote:
> mbufs are special for lots of reasons, but one is that the total
> amount of memory that mbufs can be allocated out of is limited by
> mbuf_mem_limit. all the mbuf and cluster pools are subject to that
> limit, which is enforced by having these pools use a custom pool
> page allocator that checks that limit and accounts for their use
> of it.
>
> the problem is the pools don't coordinate with each other. when
> mbuf_mem_limit is hit, it's possible for a sleeping allocation to
> wait on memory in one pool, but when memory is released by another
> pool that first one doesn't know about it, and doesn't get woken
> up to try and allocate pages that are now free in the backend page
> allocator.
>
> the simple fix for this is to wakeup the mbuf pools when pages are
> returned to the backend mbuf page allocator. if any of the pools have
> pending allocation requests, they are moved forward by the wakeup.
>
> this means if a system does hit the mbuf mem limit and a lot of
> procs/threads get stuck sleeping on mbuf allocations, there's a
> better chance they can now be pushed forward if another mbuf pool
> backs off and gives memory back to the system.
>
> the wakeups are deferred to a task running in the systqmp taskq.
> this is the same taskq that the pool gc ops run in. if multiple
> mbuf pools have gced pages released, this debounces the wakeup calls
> so they only happen once per pool gc run.
>
> i could avoid the wakeup calls by only scheduling the task when the
> current mbuf_mem_alloc value is close to mbuf_mem_limit, but the pool gc
> process is the extremely slow path anyway. the ratio of pool_put
> operations to m_pool_free ops is many millions to one.
>
> im going to commit this in the next day or two unless there are
> objections. oks are welcome too.
OK claudio@
Is the delay introduced deboucing via the task a problem? Once the
memory is freed some other pool is able to grab it before the wakeup makes
it through.
Also isn't m_pool_free() already running on the pool gc (and so systqmp)
taskq?
> Index: uipc_mbuf.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
> diff -u -p -r1.302 uipc_mbuf.c
> --- uipc_mbuf.c 6 Aug 2025 14:00:33 -0000 1.302
> +++ uipc_mbuf.c 29 Jan 2026 00:32:28 -0000
> @@ -81,6 +81,7 @@
> #include <sys/pool.h>
> #include <sys/percpu.h>
> #include <sys/sysctl.h>
> +#include <sys/task.h>
>
> #include <sys/socket.h>
> #include <net/if.h>
> @@ -131,6 +132,9 @@ void m_zero(struct mbuf *);
> unsigned long mbuf_mem_limit; /* [a] how much memory can be allocated */
> unsigned long mbuf_mem_alloc; /* [a] how much memory has been allocated */
>
> +void m_pool_wakeup(void *);
> +struct task mbuf_mem_wakeup = TASK_INITIALIZER(m_pool_wakeup, NULL);
> +
> void *m_pool_alloc(struct pool *, int, int *);
> void m_pool_free(struct pool *, void *);
>
> @@ -212,17 +216,13 @@ mbcpuinit(void)
> int
> nmbclust_update(long newval)
> {
> - int i;
> -
> if (newval <= 0 || newval > LONG_MAX / MCLBYTES)
> return ERANGE;
> /* update the global mbuf memory limit */
> atomic_store_long(&nmbclust, newval);
> atomic_store_long(&mbuf_mem_limit, newval * MCLBYTES);
>
> - pool_wakeup(&mbpool);
> - for (i = 0; i < nitems(mclsizes); i++)
> - pool_wakeup(&mclpools[i]);
> + task_add(systqmp, &mbuf_mem_wakeup);
>
> return 0;
> }
> @@ -1471,6 +1471,18 @@ m_pool_free(struct pool *pp, void *v)
> (*pool_allocator_multi.pa_free)(pp, v);
>
> atomic_sub_long(&mbuf_mem_alloc, pp->pr_pgsize);
> +
> + task_add(systqmp, &mbuf_mem_wakeup);
> +}
> +
> +void
> +m_pool_wakeup(void *null)
> +{
> + int i;
> +
> + pool_wakeup(&mbpool);
> + for (i = 0; i < nitems(mclsizes); i++)
> + pool_wakeup(&mclpools[i]);
> }
>
> void
>
--
:wq Claudio
wake up mbuf pools when pages get released