Download raw body.
wake up mbuf pools when pages get released
mbufs are special for lots of reasons, but one is that the total
amount of memory that mbufs can be allocated out of is limited by
mbuf_mem_limit. all the mbuf and cluster pools are subject to that
limit, which is enforced by having these pools use a custom pool
page allocator that checks that limit and accounts for their use
of it.
the problem is the pools don't coordinate with each other. when
mbuf_mem_limit is hit, it's possible for a sleeping allocation to
wait on memory in one pool, but when memory is released by another
pool that first one doesn't know about it, and doesn't get woken
up to try and allocate pages that are now free in the backend page
allocator.
the simple fix for this is to wakeup the mbuf pools when pages are
returned to the backend mbuf page allocator. if any of the pools have
pending allocation requests, they are moved forward by the wakeup.
this means if a system does hit the mbuf mem limit and a lot of
procs/threads get stuck sleeping on mbuf allocations, there's a
better chance they can now be pushed forward if another mbuf pool
backs off and gives memory back to the system.
the wakeups are deferred to a task running in the systqmp taskq.
this is the same taskq that the pool gc ops run in. if multiple
mbuf pools have gced pages released, this debounces the wakeup calls
so they only happen once per pool gc run.
i could avoid the wakeup calls by only scheduling the task when the
current mbuf_mem_alloc value is close to mbuf_mem_limit, but the pool gc
process is the extremely slow path anyway. the ratio of pool_put
operations to m_pool_free ops is many millions to one.
im going to commit this in the next day or two unless there are
objections. oks are welcome too.
Index: uipc_mbuf.c
===================================================================
RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
diff -u -p -r1.302 uipc_mbuf.c
--- uipc_mbuf.c 6 Aug 2025 14:00:33 -0000 1.302
+++ uipc_mbuf.c 29 Jan 2026 00:32:28 -0000
@@ -81,6 +81,7 @@
#include <sys/pool.h>
#include <sys/percpu.h>
#include <sys/sysctl.h>
+#include <sys/task.h>
#include <sys/socket.h>
#include <net/if.h>
@@ -131,6 +132,9 @@ void m_zero(struct mbuf *);
unsigned long mbuf_mem_limit; /* [a] how much memory can be allocated */
unsigned long mbuf_mem_alloc; /* [a] how much memory has been allocated */
+void m_pool_wakeup(void *);
+struct task mbuf_mem_wakeup = TASK_INITIALIZER(m_pool_wakeup, NULL);
+
void *m_pool_alloc(struct pool *, int, int *);
void m_pool_free(struct pool *, void *);
@@ -212,17 +216,13 @@ mbcpuinit(void)
int
nmbclust_update(long newval)
{
- int i;
-
if (newval <= 0 || newval > LONG_MAX / MCLBYTES)
return ERANGE;
/* update the global mbuf memory limit */
atomic_store_long(&nmbclust, newval);
atomic_store_long(&mbuf_mem_limit, newval * MCLBYTES);
- pool_wakeup(&mbpool);
- for (i = 0; i < nitems(mclsizes); i++)
- pool_wakeup(&mclpools[i]);
+ task_add(systqmp, &mbuf_mem_wakeup);
return 0;
}
@@ -1471,6 +1471,18 @@ m_pool_free(struct pool *pp, void *v)
(*pool_allocator_multi.pa_free)(pp, v);
atomic_sub_long(&mbuf_mem_alloc, pp->pr_pgsize);
+
+ task_add(systqmp, &mbuf_mem_wakeup);
+}
+
+void
+m_pool_wakeup(void *null)
+{
+ int i;
+
+ pool_wakeup(&mbpool);
+ for (i = 0; i < nitems(mclsizes); i++)
+ pool_wakeup(&mclpools[i]);
}
void
wake up mbuf pools when pages get released