Download raw body.
per-CPU page caches for page faults
On 18/03/24(Mon) 21:32, Mark Kettenis wrote: > [...] > > The diff includes 3 new counters visible in "systat uvm" and "vmstat -s". > > > > When the page daemon kicks in we drain the cache of the current CPU which > > is the best we can do without adding too much complexity. > > There is an interesting trade-off to be made between bunching allocs > and frees and having free pages in the cache. You opted to use 15 of > the 16 slots for bunching allocs/frees. Did you explore a more even > split like 8/8? Mostly just curious. I didn't explore that. > > I only tested amd64 and arm64, that's why there is such define in > > uvm/uvm_page.c. I'd be happy to hear from tests on other architectures > > and different topologies. You'll need to edit $arch/include/cpu.h and > > modify the define. > > Instead of having the defined(__amd64__) || defined (__arm64__) this > should probably be a #define __HAVE_UVM_PCU inc <machine/cpu.h>. > Unless the goal is to convert all architectures that support > MULTIPROCESSOR kernels swiftly. Yes I can do that. > > This diff is really interesting because it now allows us to clearly see > > which syscall are contenting a lot. Without surprise it's kbind(2), > > munmap(2) and mprotect(2). It also shows which workloads are VFS-bound. > > That is what the "Buffer-Cache Cold" (BC Cold) numbers represent above. > > With a small number of CPUs we don't see much difference between the two. > > > > Comments? > > A few more down below. Thanks!
per-CPU page caches for page faults