Download raw body.
vfs: cap maxvnodes autogrow from bcstats.numbufs
Hello,
I am seeing a serious performance issue on OpenBSD on a hosting server
with many files and 128 GB RAM.
After running a large backup scan, for example with restic or rsync, the
kernel cache grows very large. That alone would not be a problem, but
after such a scan normal file access becomes much slower.
This is especially visible with PHP CMS workloads, where applications
perform more filesystem I/O and touch many files during a single
request. Simple websites load about 3-4 times slower, while larger PHP
CMS-based sites can become tens of times slower after the backup scan.
The server uses fast NVMe storage. In this workload, a very large
vnode/buffer cache appears to hurt performance more than it helps.
I traced the issue to sys/kern/vfs_subr.c, in getnewvnode():
```
maxvnodes = maxvnodes < bcstats.numbufs ? bcstats.numbufs
: maxvnodes;
```
Because of this, maxvnodes can grow to match bcstats.numbufs and is
never reduced afterwards. After a large backup scan this results in a
very large vnode limit, and the system keeps a huge amount of
vnode/buffer cache state.
As a local test, I disabled this automatic maxvnodes growth. With this
change the kernel respects the configured kern.maxvnodes behavior much
better. In my case kern.maxvnodes is 5926 and kern.numvnodes stays
around 11854, which matches the expected 2x behavior from vntblinit().
After applying this patch, the slowdown disappears on my workload. PHP
CMS sites return to normal response times even after large restic/rsync
backup scans.
My local test patch is:
# Index: sys/kern/vfs_subr.c
RCS file: /cvs/src/sys/kern/vfs_subr.c,v
retrieving revision 1.319
diff -u -p -u -r1.319 vfs_subr.c
--- sys/kern/vfs_subr.c 3 Feb 2024 18:51:58 -0000 1.319
+++ sys/kern/vfs_subr.c 11 Feb 2025 01:52:47 -0000
@@ -379,8 +379,8 @@ getnewvnode(enum vtagtype tag, struct mo
* allow maxvnodes to increase if the buffer cache itself
* is big enough to justify it. (we don't shrink it ever)
*/
* maxvnodes = maxvnodes < bcstats.numbufs ? bcstats.numbufs
* ```
: maxvnodes;
```
+// maxvnodes = maxvnodes < bcstats.numbufs ? bcstats.numbufs
+// : maxvnodes;
```
/*
* We must choose whether to allocate a new vnode or recycle an
```
I also checked vfs_subr.c revision 1.333. It moves UVM vnode allocation
out of getnewvnode(), which reduces memory waste per vnode, but it does
not address this issue. The maxvnodes autogrow based on bcstats.numbufs
is still present, so maxvnodes can still grow after a large backup scan
and never shrink afterwards.
I do not claim that simply removing this code is the best final fix. It
is only a local workaround that clearly improves this workload. Maybe a
better solution would be to limit this autogrow, make it shrinkable, or
expose a tunable to control the maximum automatic vnode growth caused by
buffer cache size.
I can provide more details, measurements, sysctl output, or test
alternative patches if needed.
System details:
* OpenBSD version:
* Architecture: amd64
* RAM: 128 GB
* Storage: NVMe
* Workload: hosting server, many small files, many PHP CMS installations
* Backup tools tested: restic, rsync
* kern.maxvnodes: 5926
* kern.numvnodes after patch: about 11854
Best regards,
Robert
vfs: cap maxvnodes autogrow from bcstats.numbufs