Index | Thread | Search

From:
Martin Pieuchot <mpi@openbsd.org>
Subject:
Re: per-CPU page caches for page faults
To:
Mark Kettenis <mark.kettenis@xs4all.nl>
Cc:
tech@openbsd.org
Date:
Sun, 24 Mar 2024 11:55:45 +0100

Download raw body.

Thread
On 18/03/24(Mon) 21:32, Mark Kettenis wrote:
> [...] 
> > The diff includes 3 new counters visible in "systat uvm" and "vmstat -s".
> > 
> > When the page daemon kicks in we drain the cache of the current CPU which
> > is the best we can do without adding too much complexity.
> 
> There is an interesting trade-off to be made between bunching allocs
> and frees and having free pages in the cache.  You opted to use 15 of
> the 16 slots for bunching allocs/frees.  Did you explore a more even
> split like 8/8?  Mostly just curious.

I didn't explore that.

> > I only tested amd64 and arm64, that's why there is such define in
> > uvm/uvm_page.c.  I'd be happy to hear from tests on other architectures
> > and different topologies.  You'll need to edit $arch/include/cpu.h and
> > modify the define.
> 
> Instead of having the defined(__amd64__) || defined (__arm64__) this
> should probably be a #define __HAVE_UVM_PCU inc <machine/cpu.h>.
> Unless the goal is to convert all architectures that support
> MULTIPROCESSOR kernels swiftly.

Yes I can do that.

> > This diff is really interesting because it now allows us to clearly see
> > which syscall are contenting a lot.  Without surprise it's kbind(2),
> > munmap(2) and mprotect(2).  It also shows which workloads are VFS-bound.
> > That is what the "Buffer-Cache Cold" (BC Cold) numbers represent above.
> > With a small number of CPUs we don't see much difference between the two.
> > 
> > Comments?
> 
> A few more down below.

Thanks!