From: Martin Pieuchot <mpi@openbsd.org>
Subject: Re: per-CPU page caches for page faults
To: tech@openbsd.org
Date: Sun, 31 Mar 2024 19:00:30 +0200

On 19/03/24(Tue) 15:06, David Gwynne wrote:
> On Mon, Mar 18, 2024 at 08:13:43PM +0100, Martin Pieuchot wrote:
> > Diff below attaches a 16 page array to the "struct cpuinfo" and uses it
> > as a cache to reduce contention on the global pmemrange mutex.
> > 
> > Measured performance improvements are between 7% to 13% with 16 CPUs
> > and 19% to 33% with 32 CPUs.  -current OpenBSD doesn't scale above 32
> > CPUs so it wouldn't be fair to compare number of jobs spread across
> > more CPUs.  However, as you can see below, this limitation is no longer
> > true with this diff.
> > 
> > kernel
> > ------
> > 16:     1m47.93s real    11m24.18s user    10m55.78s system
> > 32:     2m33.30s real    11m46.08s user    32m32.35s system (BC cold)
> >         2m02.36s real    11m55.12s user    21m40.66s system
> > 64:     2m00.72s real    11m59.59s user    25m47.63s system
> > 
> > libLLVM
> > -------
> > 16:     30m45.54s real   363m25.35s user   150m34.05s system
> > 32:     24m29.88s real   409m49.80s user   311m02.54s system
> > 64:     29m22.63s real   404m16.20s user   771m31.26s system
> > 80:     30m12.49s real   398m07.01s user   816m01.71s system
> > 
> > kernel+percpucaches(16)
> > ------
> > 16:	1m30.17s real    11m19.29s user     6m42.08s system
> > 32:     2m02.28s real    11m42.13s user    23m42.64s system (BC cold)
> >         1m22.82s real    11m41.72s user     8m50.12s system
> > 64:     1m23.47s real    11m56.99s user     9m42.00s system
> > 80:     1m24.63s real    11m44.24s user    10m38.00s system
> > 
> > libLLVM+percpucaches(16)
> > -------
> > 16:     28m38.73s real   363m34.69s user    95m45.68s system
> > 32:     19m57.71s real   415m17.23s user   174m47.83s system
> > 64:     18m59.50s real   450m17.79s user   406m05.42s system
> > 80:     19m02.26s real   452m35.11s user   473m09.05s system
> > 
> > Still the most important impact of this diff is the reduction of %sys
> > time.  It drops from ~40% with 16 CPUs and ~55% with 32 CPUs or more.
> > 
> > What is the idea behind this diff?  With a consequent number of CPUs (16
> > or more) grabbing a global mutex for every page allocation & free creates
> > a lot of contention resulting in many CPU cycles wasted in system (kernel)
> > time.  The idea of this diff is to add another layer on top of the global
> > allocator to allocate and free pages in batch.  Note that, in this diff,
> > this cache is only used for page faults.
> > 
> > The number of 16 has been chosen after careful testing on a 80 CPU Ampere
> > machine.  It tried to keep it as small as possible while making sure that
> > multiple parallel page faults on a large number of CPUs do not result in
> > contention.  I'd argue that "stealing" at most 64k per CPU is acceptable
> > on any MP system.
> > 
> > The diff includes 3 new counters visible in "systat uvm" and "vmstat -s".
> > 
> > When the page daemon kicks in we drain the cache of the current CPU which
> > is the best we can do without adding too much complexity.
> > 
> > I only tested amd64 and arm64, that's why there is such define in
> > uvm/uvm_page.c.  I'd be happy to hear from tests on other architectures
> > and different topologies.  You'll need to edit $arch/include/cpu.h and
> > modify the define.
> > 
> > This diff is really interesting because it now allows us to clearly see
> > which syscall are contenting a lot.  Without surprise it's kbind(2),
> > munmap(2) and mprotect(2).  It also shows which workloads are VFS-bound.
> > That is what the "Buffer-Cache Cold" (BC Cold) numbers represent above.
> > With a small number of CPUs we don't see much difference between the two.
> > 
> > Comments?
> 
> i like the idea, and i like the improvements.
> 
> this is basically the same problem that jeff bonwick deals with in
> his magazines and vmem paper about the changes he made to the solaris
> slab allocator to make it scale on machines with a bunch of cpus.
> that's the reference i used when i implemented per cpu caches in
> pools, and it's probably worth following here as well. the only
> real change i'd want you to make is to introduce the "previously
> loaded magazine" to mitigate thrashing as per section 3.1 in the
> paper.
> 
> pretty exciting though.

New version that should address all previous comments:

- Use 2 magazines of 8 pages and imitate the pool_cache code.  The
  miss/hit ratio can be observed to be 1/8 with "systat uvm".

- Ensure that uvm_pmr_getpages() won't fail with highly fragmented
  memory and do not wakup the pagedaemon if it fails to fully reload a
  magazine.

- Use __HAVE_UVM_PERCPU & provide UP versions of cache_get/cache_put().

- Change amap_wipeout() to call uvm_anfree() to fill the cache instead of
  bypassing it by calling uvm_pglistfree(). 

- Include a fix for incorrect decrementing of `uvm.swpgonly' in
  uvm_anon_release() (should be committed independently).

I didn't do any measurement with this version but robert@ said it shave
off 30 minutes compared to the previous one for a chromium build w/ 32
CPUs (from 4.5h down to 4h).

Comments?  Tests?

Index: usr.bin/systat/uvm.c
===================================================================
RCS file: /cvs/src/usr.bin/systat/uvm.c,v
diff -u -p -r1.6 uvm.c
--- usr.bin/systat/uvm.c	27 Nov 2022 23:18:54 -0000	1.6
+++ usr.bin/systat/uvm.c	29 Mar 2024 20:56:32 -0000
@@ -80,11 +80,10 @@ struct uvmline uvmline[] = {
 	{ &uvmexp.zeropages, &last_uvmexp.zeropages, "zeropages",
 	  &uvmexp.pageins, &last_uvmexp.pageins, "pageins",
 	  &uvmexp.fltrelckok, &last_uvmexp.fltrelckok, "fltrelckok" },
-	{ &uvmexp.reserve_pagedaemon, &last_uvmexp.reserve_pagedaemon,
-	  "reserve_pagedaemon",
+	{ &uvmexp.percpucaches, &last_uvmexp.percpucaches, "percpucaches",
 	  &uvmexp.pgswapin, &last_uvmexp.pgswapin, "pgswapin",
 	  &uvmexp.fltanget, &last_uvmexp.fltanget, "fltanget" },
-	{ &uvmexp.reserve_kernel, &last_uvmexp.reserve_kernel, "reserve_kernel",
+	{ NULL, NULL, NULL,
 	  &uvmexp.pgswapout, &last_uvmexp.pgswapout, "pgswapout",
 	  &uvmexp.fltanretry, &last_uvmexp.fltanretry, "fltanretry" },
 	{ NULL, NULL, NULL,
@@ -143,13 +142,13 @@ struct uvmline uvmline[] = {
 	  NULL, NULL, NULL },
 	{ &uvmexp.pagesize, &last_uvmexp.pagesize, "pagesize",
 	  &uvmexp.pdpending, &last_uvmexp.pdpending, "pdpending",
-	  NULL, NULL, NULL },
+	  NULL, NULL, "Per-CPU Counters" },
 	{ &uvmexp.pagemask, &last_uvmexp.pagemask, "pagemask",
 	  &uvmexp.pddeact, &last_uvmexp.pddeact, "pddeact",
-	  NULL, NULL, NULL },
+	  &uvmexp.pcphit, &last_uvmexp.pcphit, "pcphit" },
 	{ &uvmexp.pageshift, &last_uvmexp.pageshift, "pageshift",
 	  NULL, NULL, NULL,
-	  NULL, NULL, NULL }
+	  &uvmexp.pcpmiss, &last_uvmexp.pcpmiss, "pcpmiss" }
 };
 
 field_def fields_uvm[] = {
Index: usr.bin/vmstat/vmstat.c
===================================================================
RCS file: /cvs/src/usr.bin/vmstat/vmstat.c,v
diff -u -p -r1.155 vmstat.c
--- usr.bin/vmstat/vmstat.c	4 Dec 2022 23:50:50 -0000	1.155
+++ usr.bin/vmstat/vmstat.c	29 Mar 2024 20:56:32 -0000
@@ -513,7 +513,12 @@ dosum(void)
 		     uvmexp.reserve_pagedaemon);
 	(void)printf("%11u pages reserved for kernel\n",
 		     uvmexp.reserve_kernel);
+	(void)printf("%11u pages in per-cpu caches\n",
+		     uvmexp.percpucaches);
 
+	/* per-cpu cache */
+	(void)printf("%11u per-cpu cache hits\n", uvmexp.pcphit);
+	(void)printf("%11u per-cpu cache misses\n", uvmexp.pcpmiss);
 	/* swap */
 	(void)printf("%11u swap pages\n", uvmexp.swpages);
 	(void)printf("%11u swap pages in use\n", uvmexp.swpginuse);
Index: uvm/uvm_amap.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_amap.c,v
diff -u -p -r1.92 uvm_amap.c
--- sys/uvm/uvm_amap.c	11 Apr 2023 00:45:09 -0000	1.92
+++ sys/uvm/uvm_amap.c	30 Mar 2024 17:30:10 -0000
@@ -482,7 +482,6 @@ amap_wipeout(struct vm_amap *amap)
 	int slot;
 	struct vm_anon *anon;
 	struct vm_amap_chunk *chunk;
-	struct pglist pgl;
 
 	KASSERT(rw_write_held(amap->am_lock));
 	KASSERT(amap->am_ref == 0);
@@ -495,7 +494,6 @@ amap_wipeout(struct vm_amap *amap)
 		return;
 	}
 
-	TAILQ_INIT(&pgl);
 	amap_list_remove(amap);
 
 	AMAP_CHUNK_FOREACH(chunk, amap) {
@@ -515,12 +513,10 @@ amap_wipeout(struct vm_amap *amap)
 			 */
 			refs = --anon->an_ref;
 			if (refs == 0) {
-				uvm_anfree_list(anon, &pgl);
+				uvm_anfree(anon);
 			}
 		}
 	}
-	/* free the pages */
-	uvm_pglistfree(&pgl);
 
 	/*
 	 * Finally, destroy the amap.
Index: sys/uvm/uvm_anon.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_anon.c,v
diff -u -p -r1.57 uvm_anon.c
--- sys/uvm/uvm_anon.c	27 Oct 2023 19:13:51 -0000	1.57
+++ sys/uvm/uvm_anon.c	30 Mar 2024 09:21:19 -0000
@@ -116,7 +116,7 @@ uvm_anfree_list(struct vm_anon *anon, st
 			uvm_unlock_pageq();	/* free the daemon */
 		}
 	} else {
-		if (anon->an_swslot != 0 && anon->an_swslot != SWSLOT_BAD) {
+		if (anon->an_swslot > 0) {
 			/* This page is no longer only in swap. */
 			KASSERT(uvmexp.swpgonly > 0);
 			atomic_dec_int(&uvmexp.swpgonly);
@@ -260,7 +260,8 @@ uvm_anon_release(struct vm_anon *anon)
 	uvm_unlock_pageq();
 	KASSERT(anon->an_page == NULL);
 	lock = anon->an_lock;
-	uvm_anfree(anon);
+	uvm_anon_dropswap(anon);
+	pool_put(&uvm_anon_pool, anon);
 	rw_exit(lock);
 	/* Note: extra reference is held for PG_RELEASED case. */
 	rw_obj_free(lock);
Index: sys/uvm/uvm_page.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_page.c,v
diff -u -p -r1.174 uvm_page.c
--- sys/uvm/uvm_page.c	13 Feb 2024 10:16:28 -0000	1.174
+++ sys/uvm/uvm_page.c	31 Mar 2024 12:16:46 -0000
@@ -75,6 +75,7 @@
 #include <sys/smr.h>
 
 #include <uvm/uvm.h>
+#include <uvm/uvm_percpu.h>
 
 /*
  * for object trees
@@ -120,6 +121,10 @@ static void uvm_pageinsert(struct vm_pag
 static void uvm_pageremove(struct vm_page *);
 int uvm_page_owner_locked_p(struct vm_page *);
 
+struct vm_page *uvm_pmr_getone(int);
+struct vm_page *uvm_pmr_cache_get(int);
+void uvm_pmr_cache_put(struct vm_page *);
+
 /*
  * inline functions
  */
@@ -877,13 +882,11 @@ uvm_pagerealloc_multi(struct uvm_object 
  * => only one of obj or anon can be non-null
  * => caller must activate/deactivate page if it is not wired.
  */
-
 struct vm_page *
 uvm_pagealloc(struct uvm_object *obj, voff_t off, struct vm_anon *anon,
     int flags)
 {
-	struct vm_page *pg;
-	struct pglist pgl;
+	struct vm_page *pg = NULL;
 	int pmr_flags;
 
 	KASSERT(obj == NULL || anon == NULL);
@@ -906,13 +909,10 @@ uvm_pagealloc(struct uvm_object *obj, vo
 
 	if (flags & UVM_PGA_ZERO)
 		pmr_flags |= UVM_PLA_ZERO;
-	TAILQ_INIT(&pgl);
-	if (uvm_pmr_getpages(1, 0, 0, 1, 0, 1, pmr_flags, &pgl) != 0)
-		goto fail;
-
-	pg = TAILQ_FIRST(&pgl);
-	KASSERT(pg != NULL && TAILQ_NEXT(pg, pageq) == NULL);
 
+	pg = uvm_pmr_cache_get(pmr_flags);
+	if (pg == NULL)
+		return NULL;
 	uvm_pagealloc_pg(pg, obj, off, anon);
 	KASSERT((pg->pg_flags & PG_DEV) == 0);
 	if (flags & UVM_PGA_ZERO)
@@ -921,9 +921,6 @@ uvm_pagealloc(struct uvm_object *obj, vo
 		atomic_setbits_int(&pg->pg_flags, PG_CLEAN);
 
 	return pg;
-
-fail:
-	return NULL;
 }
 
 /*
@@ -1025,7 +1022,7 @@ void
 uvm_pagefree(struct vm_page *pg)
 {
 	uvm_pageclean(pg);
-	uvm_pmr_freepages(pg, 1);
+	uvm_pmr_cache_put(pg);
 }
 
 /*
@@ -1398,3 +1395,153 @@ uvm_pagecount(struct uvm_constraint_rang
 	}
 	return sz;
 }
+
+struct vm_page *
+uvm_pmr_getone(int flags)
+{
+	struct vm_page *pg;
+	struct pglist pgl;
+
+	TAILQ_INIT(&pgl);
+	if (uvm_pmr_getpages(1, 0, 0, 1, 0, 1, flags, &pgl) != 0)
+		return NULL;
+
+	pg = TAILQ_FIRST(&pgl);
+	KASSERT(pg != NULL && TAILQ_NEXT(pg, pageq) == NULL);
+
+	return pg;
+}
+
+#if defined(MULTIPROCESSOR) && defined(__HAVE_UVM_PERCPU)
+
+/*
+ * Reload a magazine.
+ */
+int
+uvm_pmr_cache_alloc(struct uvm_pmr_cache_item *upci)
+{
+	struct vm_page *pg;
+	struct pglist pgl;
+	int flags = UVM_PLA_NOWAIT|UVM_PLA_NOWAKE;
+	int npages = UVM_PMR_CACHEMAGSZ;
+
+	KASSERT(upci->upci_npages == 0);
+
+	TAILQ_INIT(&pgl);
+	if (uvm_pmr_getpages(npages, 0, 0, 1, 0, npages, flags, &pgl))
+		return -1;
+
+	while ((pg = TAILQ_FIRST(&pgl)) != NULL) {
+		TAILQ_REMOVE(&pgl, pg, pageq);
+		upci->upci_pages[upci->upci_npages] = pg;
+		upci->upci_npages++;
+	}
+	atomic_add_int(&uvmexp.percpucaches, npages);
+
+	return 0;
+}
+
+struct vm_page *
+uvm_pmr_cache_get(int flags)
+{
+	struct uvm_pmr_cache *upc = &curcpu()->ci_uvm;
+	struct uvm_pmr_cache_item *upci;
+	struct vm_page *pg;
+
+	upci = &upc->upc_magz[upc->upc_actv];
+	if (upci->upci_npages == 0) {
+		unsigned int prev;
+
+		prev = (upc->upc_actv == 0) ?  1 : 0;
+		upci = &upc->upc_magz[prev];
+		if (upci->upci_npages == 0) {
+			atomic_inc_int(&uvmexp.pcpmiss);
+			if (uvm_pmr_cache_alloc(upci))
+				return uvm_pmr_getone(flags);
+		}
+		/* Swap magazines */
+		upc->upc_actv = prev;
+	} else {
+		atomic_inc_int(&uvmexp.pcphit);
+	}
+
+	atomic_dec_int(&uvmexp.percpucaches);
+	upci->upci_npages--;
+	pg = upci->upci_pages[upci->upci_npages];
+
+	if (flags & UVM_PLA_ZERO)
+		uvm_pagezero(pg);
+
+	return pg;
+}
+
+void
+uvm_pmr_cache_free(struct uvm_pmr_cache_item *upci)
+{
+	struct pglist pgl;
+	int i;
+
+	TAILQ_INIT(&pgl);
+	for (i = 0; i < upci->upci_npages; i++)
+		TAILQ_INSERT_TAIL(&pgl, upci->upci_pages[i], pageq);
+
+	uvm_pmr_freepageq(&pgl);
+
+	atomic_sub_int(&uvmexp.percpucaches, upci->upci_npages);
+	upci->upci_npages = 0;
+	memset(upci->upci_pages, 0, sizeof(upci->upci_pages));
+}
+
+void
+uvm_pmr_cache_put(struct vm_page *pg)
+{
+	struct uvm_pmr_cache *upc = &curcpu()->ci_uvm;
+	struct uvm_pmr_cache_item *upci;
+
+	upci = &upc->upc_magz[upc->upc_actv];
+	if (upci->upci_npages >= UVM_PMR_CACHEMAGSZ) {
+		unsigned int prev;
+
+		prev = (upc->upc_actv == 0) ?  1 : 0;
+		upci = &upc->upc_magz[prev];
+		if (upci->upci_npages > 0)
+			uvm_pmr_cache_free(upci);
+
+		/* Swap magazines */
+		upc->upc_actv = prev;
+		KASSERT(upci->upci_npages == 0);
+	}
+
+	upci->upci_pages[upci->upci_npages] = pg;
+	upci->upci_npages++;
+	atomic_inc_int(&uvmexp.percpucaches);
+}
+
+void
+uvm_pmr_cache_drain(void)
+{
+	struct uvm_pmr_cache *upc = &curcpu()->ci_uvm;
+
+	uvm_pmr_cache_free(&upc->upc_magz[0]);
+	uvm_pmr_cache_free(&upc->upc_magz[1]);
+}
+
+#else /* !(MULTIPROCESSOR && __HAVE_UVM_PERCPU) */
+
+struct vm_page *
+uvm_pmr_cache_get(int flags)
+{
+	return uvm_pmr_getone(flags);
+}
+
+void
+uvm_pmr_cache_put(struct vm_page *pg)
+{
+	uvm_pmr_freepages(pg, 1);
+}
+
+void
+uvm_pmr_cache_drain(void)
+{
+}
+#endif /* MULTIPROCESSOR */
Index: sys/uvm/uvm_pdaemon.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_pdaemon.c,v
diff -u -p -r1.110 uvm_pdaemon.c
--- sys/uvm/uvm_pdaemon.c	24 Mar 2024 10:29:35 -0000	1.110
+++ sys/uvm/uvm_pdaemon.c	30 Mar 2024 12:53:39 -0000
@@ -80,6 +80,7 @@
 #endif
 
 #include <uvm/uvm.h>
+#include <uvm/uvm_percpu.h>
 
 #include "drm.h"
 
@@ -262,6 +263,8 @@ uvm_pageout(void *arg)
 #if NDRM > 0
 		drmbackoff(size * 2);
 #endif
+		uvm_pmr_cache_drain();
+
 		/*
 		 * scan if needed
 		 */
Index: sys/uvm/uvm_percpu.h
===================================================================
RCS file: sys/uvm/uvm_percpu.h
diff -N sys/uvm/uvm_percpu.h
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ sys/uvm/uvm_percpu.h	30 Mar 2024 12:54:47 -0000
@@ -0,0 +1,45 @@
+/*	$OpenBSD$	*/
+
+/*
+ * Copyright (c) 2024 Martin Pieuchot <mpi@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef _UVM_UVM_PCPU_H_
+#define _UVM_UVM_PCPU_H_
+
+/*
+ * We want a per-CPU cache size to be as small as possible and at the
+ * same time gets rid of the `uvm_lock_fpageq' contention.
+ */
+#define UVM_PMR_CACHEMAGSZ 8	/* # of pages in a magazine */
+
+struct vm_page;
+
+/* Magazine */
+struct uvm_pmr_cache_item {
+	struct vm_page		*upci_pages[UVM_PMR_CACHEMAGSZ];
+	int			 upci_npages;	/* # of pages in magazine */
+};
+
+/* Per-CPU cache */
+struct uvm_pmr_cache {
+	struct uvm_pmr_cache_item upc_magz[2];	/* magazines */
+	int			  upc_actv;	/* index of active magazine */
+
+};
+
+void uvm_pmr_cache_drain(void);
+
+#endif /* _UVM_UVM_PCPU_H_ */
Index: sys/uvm/uvmexp.h
===================================================================
RCS file: /cvs/src/sys/uvm/uvmexp.h,v
diff -u -p -r1.12 uvmexp.h
--- sys/uvm/uvmexp.h	24 Mar 2024 10:29:35 -0000	1.12
+++ sys/uvm/uvmexp.h	29 Mar 2024 21:04:16 -0000
@@ -66,7 +66,7 @@ struct uvmexp {
 	int zeropages;		/* [F] number of zero'd pages */
 	int reserve_pagedaemon; /* [I] # of pages reserved for pagedaemon */
 	int reserve_kernel;	/* [I] # of pages reserved for kernel */
-	int unused01;		/* formerly anonpages */
+	int percpucaches;	/* [a] # of pages in per-CPU caches */
 	int vnodepages;		/* XXX # of pages used by vnode page cache */
 	int vtextpages;		/* XXX # of pages used by vtext vnodes */
 
@@ -101,8 +101,8 @@ struct uvmexp {
 	int syscalls;		/* system calls */
 	int pageins;		/* [p] pagein operation count */
 				/* pageouts are in pdpageouts below */
-	int unused07;           /* formerly obsolete_swapins */
-	int unused08;           /* formerly obsolete_swapouts */
+	int pcphit;		/* [a] # of pagealloc from per-CPU cache */
+	int pcpmiss;		/* [a] # of times a per-CPU cache was empty */
 	int pgswapin;		/* pages swapped in */
 	int pgswapout;		/* pages swapped out */
 	int forks;  		/* forks */
Index: sys/arch/amd64/include/cpu.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
diff -u -p -r1.163 cpu.h
--- sys/arch/amd64/include/cpu.h	25 Feb 2024 19:15:50 -0000	1.163
+++ sys/arch/amd64/include/cpu.h	30 Mar 2024 12:55:27 -0000
@@ -53,6 +53,7 @@
 #include <sys/sched.h>
 #include <sys/sensors.h>
 #include <sys/srp.h>
+#include <uvm/uvm_percpu.h>
 
 #ifdef _KERNEL
 
@@ -201,6 +202,8 @@ struct cpu_info {
 
 #ifdef MULTIPROCESSOR
 	struct srp_hazard	ci_srp_hazards[SRP_HAZARD_NUM];
+#define __HAVE_UVM_PERCPU
+	struct uvm_pmr_cache	ci_uvm;		/* [o] page cache */
 #endif
 
 	struct ksensordev	ci_sensordev;
Index: sys/arch/arm64/include/cpu.h
===================================================================
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
diff -u -p -r1.43 cpu.h
--- sys/arch/arm64/include/cpu.h	25 Feb 2024 19:15:50 -0000	1.43
+++ sys/arch/arm64/include/cpu.h	30 Mar 2024 12:55:55 -0000
@@ -108,6 +108,7 @@ void	arm32_vector_init(vaddr_t, int);
 #include <sys/device.h>
 #include <sys/sched.h>
 #include <sys/srp.h>
+#include <uvm/uvm_percpu.h>
 
 struct cpu_info {
 	struct device		*ci_dev; /* Device corresponding to this CPU */
@@ -161,6 +162,8 @@ struct cpu_info {
 
 #ifdef MULTIPROCESSOR
 	struct srp_hazard	ci_srp_hazards[SRP_HAZARD_NUM];
+#define __HAVE_UVM_PERCPU
+	struct uvm_pmr_cache	ci_uvm;
 	volatile int		ci_flags;
 
 	volatile int		ci_ddb_paused;