Index | Thread | Search

From:
David Gwynne <david@gwynne.id.au>
Subject:
Re: per-CPU page caches for page faults
To:
Martin Pieuchot <mpi@openbsd.org>
Cc:
Openbsd Tech <tech@openbsd.org>
Date:
Mon, 1 Apr 2024 12:15:17 +1000

Download raw body.

Thread

> On 1 Apr 2024, at 03:00, Martin Pieuchot <mpi@openbsd.org> wrote:
> 
> On 19/03/24(Tue) 15:06, David Gwynne wrote:
>> On Mon, Mar 18, 2024 at 08:13:43PM +0100, Martin Pieuchot wrote:
>>> Diff below attaches a 16 page array to the "struct cpuinfo" and uses it
>>> as a cache to reduce contention on the global pmemrange mutex.
>>> 
>>> Measured performance improvements are between 7% to 13% with 16 CPUs
>>> and 19% to 33% with 32 CPUs.  -current OpenBSD doesn't scale above 32
>>> CPUs so it wouldn't be fair to compare number of jobs spread across
>>> more CPUs.  However, as you can see below, this limitation is no longer
>>> true with this diff.
>>> 
>>> kernel
>>> ------
>>> 16:     1m47.93s real    11m24.18s user    10m55.78s system
>>> 32:     2m33.30s real    11m46.08s user    32m32.35s system (BC cold)
>>>        2m02.36s real    11m55.12s user    21m40.66s system
>>> 64:     2m00.72s real    11m59.59s user    25m47.63s system
>>> 
>>> libLLVM
>>> -------
>>> 16:     30m45.54s real   363m25.35s user   150m34.05s system
>>> 32:     24m29.88s real   409m49.80s user   311m02.54s system
>>> 64:     29m22.63s real   404m16.20s user   771m31.26s system
>>> 80:     30m12.49s real   398m07.01s user   816m01.71s system
>>> 
>>> kernel+percpucaches(16)
>>> ------
>>> 16: 1m30.17s real    11m19.29s user     6m42.08s system
>>> 32:     2m02.28s real    11m42.13s user    23m42.64s system (BC cold)
>>>        1m22.82s real    11m41.72s user     8m50.12s system
>>> 64:     1m23.47s real    11m56.99s user     9m42.00s system
>>> 80:     1m24.63s real    11m44.24s user    10m38.00s system
>>> 
>>> libLLVM+percpucaches(16)
>>> -------
>>> 16:     28m38.73s real   363m34.69s user    95m45.68s system
>>> 32:     19m57.71s real   415m17.23s user   174m47.83s system
>>> 64:     18m59.50s real   450m17.79s user   406m05.42s system
>>> 80:     19m02.26s real   452m35.11s user   473m09.05s system
>>> 
>>> Still the most important impact of this diff is the reduction of %sys
>>> time.  It drops from ~40% with 16 CPUs and ~55% with 32 CPUs or more.
>>> 
>>> What is the idea behind this diff?  With a consequent number of CPUs (16
>>> or more) grabbing a global mutex for every page allocation & free creates
>>> a lot of contention resulting in many CPU cycles wasted in system (kernel)
>>> time.  The idea of this diff is to add another layer on top of the global
>>> allocator to allocate and free pages in batch.  Note that, in this diff,
>>> this cache is only used for page faults.
>>> 
>>> The number of 16 has been chosen after careful testing on a 80 CPU Ampere
>>> machine.  It tried to keep it as small as possible while making sure that
>>> multiple parallel page faults on a large number of CPUs do not result in
>>> contention.  I'd argue that "stealing" at most 64k per CPU is acceptable
>>> on any MP system.
>>> 
>>> The diff includes 3 new counters visible in "systat uvm" and "vmstat -s".
>>> 
>>> When the page daemon kicks in we drain the cache of the current CPU which
>>> is the best we can do without adding too much complexity.
>>> 
>>> I only tested amd64 and arm64, that's why there is such define in
>>> uvm/uvm_page.c.  I'd be happy to hear from tests on other architectures
>>> and different topologies.  You'll need to edit $arch/include/cpu.h and
>>> modify the define.
>>> 
>>> This diff is really interesting because it now allows us to clearly see
>>> which syscall are contenting a lot.  Without surprise it's kbind(2),
>>> munmap(2) and mprotect(2).  It also shows which workloads are VFS-bound.
>>> That is what the "Buffer-Cache Cold" (BC Cold) numbers represent above.
>>> With a small number of CPUs we don't see much difference between the two.
>>> 
>>> Comments?
>> 
>> i like the idea, and i like the improvements.
>> 
>> this is basically the same problem that jeff bonwick deals with in
>> his magazines and vmem paper about the changes he made to the solaris
>> slab allocator to make it scale on machines with a bunch of cpus.
>> that's the reference i used when i implemented per cpu caches in
>> pools, and it's probably worth following here as well. the only
>> real change i'd want you to make is to introduce the "previously
>> loaded magazine" to mitigate thrashing as per section 3.1 in the
>> paper.
>> 
>> pretty exciting though.
> 
> New version that should address all previous comments:
> 
> - Use 2 magazines of 8 pages and imitate the pool_cache code.  The
>  miss/hit ratio can be observed to be 1/8 with "systat uvm".
> 
> - Ensure that uvm_pmr_getpages() won't fail with highly fragmented
>  memory and do not wakup the pagedaemon if it fails to fully reload a
>  magazine.
> 
> - Use __HAVE_UVM_PERCPU & provide UP versions of cache_get/cache_put().
> 
> - Change amap_wipeout() to call uvm_anfree() to fill the cache instead of
>  bypassing it by calling uvm_pglistfree(). 
> 
> - Include a fix for incorrect decrementing of `uvm.swpgonly' in
>  uvm_anon_release() (should be committed independently).
> 
> I didn't do any measurement with this version but robert@ said it shave
> off 30 minutes compared to the previous one for a chromium build w/ 32
> CPUs (from 4.5h down to 4h).

so a chromium build with your first diff is 4.5h? or a vanilla kernel is 4.5h?

> 
> Comments?  Tests?
> 
> Index: usr.bin/systat/uvm.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/systat/uvm.c,v
> diff -u -p -r1.6 uvm.c
> --- usr.bin/systat/uvm.c 27 Nov 2022 23:18:54 -0000 1.6
> +++ usr.bin/systat/uvm.c 29 Mar 2024 20:56:32 -0000
> @@ -80,11 +80,10 @@ struct uvmline uvmline[] = {
> { &uvmexp.zeropages, &last_uvmexp.zeropages, "zeropages",
>   &uvmexp.pageins, &last_uvmexp.pageins, "pageins",
>   &uvmexp.fltrelckok, &last_uvmexp.fltrelckok, "fltrelckok" },
> - { &uvmexp.reserve_pagedaemon, &last_uvmexp.reserve_pagedaemon,
> -   "reserve_pagedaemon",
> + { &uvmexp.percpucaches, &last_uvmexp.percpucaches, "percpucaches",
>   &uvmexp.pgswapin, &last_uvmexp.pgswapin, "pgswapin",
>   &uvmexp.fltanget, &last_uvmexp.fltanget, "fltanget" },
> - { &uvmexp.reserve_kernel, &last_uvmexp.reserve_kernel, "reserve_kernel",
> + { NULL, NULL, NULL,
>   &uvmexp.pgswapout, &last_uvmexp.pgswapout, "pgswapout",
>   &uvmexp.fltanretry, &last_uvmexp.fltanretry, "fltanretry" },
> { NULL, NULL, NULL,
> @@ -143,13 +142,13 @@ struct uvmline uvmline[] = {
>   NULL, NULL, NULL },
> { &uvmexp.pagesize, &last_uvmexp.pagesize, "pagesize",
>   &uvmexp.pdpending, &last_uvmexp.pdpending, "pdpending",
> -   NULL, NULL, NULL },
> +   NULL, NULL, "Per-CPU Counters" },
> { &uvmexp.pagemask, &last_uvmexp.pagemask, "pagemask",
>   &uvmexp.pddeact, &last_uvmexp.pddeact, "pddeact",
> -   NULL, NULL, NULL },
> +   &uvmexp.pcphit, &last_uvmexp.pcphit, "pcphit" },
> { &uvmexp.pageshift, &last_uvmexp.pageshift, "pageshift",
>   NULL, NULL, NULL,
> -   NULL, NULL, NULL }
> +   &uvmexp.pcpmiss, &last_uvmexp.pcpmiss, "pcpmiss" }
> };
> 
> field_def fields_uvm[] = {
> Index: usr.bin/vmstat/vmstat.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/vmstat/vmstat.c,v
> diff -u -p -r1.155 vmstat.c
> --- usr.bin/vmstat/vmstat.c 4 Dec 2022 23:50:50 -0000 1.155
> +++ usr.bin/vmstat/vmstat.c 29 Mar 2024 20:56:32 -0000
> @@ -513,7 +513,12 @@ dosum(void)
>      uvmexp.reserve_pagedaemon);
> (void)printf("%11u pages reserved for kernel\n",
>      uvmexp.reserve_kernel);
> + (void)printf("%11u pages in per-cpu caches\n",
> +      uvmexp.percpucaches);
> 
> + /* per-cpu cache */
> + (void)printf("%11u per-cpu cache hits\n", uvmexp.pcphit);
> + (void)printf("%11u per-cpu cache misses\n", uvmexp.pcpmiss);
> /* swap */
> (void)printf("%11u swap pages\n", uvmexp.swpages);
> (void)printf("%11u swap pages in use\n", uvmexp.swpginuse);
> Index: uvm/uvm_amap.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_amap.c,v
> diff -u -p -r1.92 uvm_amap.c
> --- sys/uvm/uvm_amap.c 11 Apr 2023 00:45:09 -0000 1.92
> +++ sys/uvm/uvm_amap.c 30 Mar 2024 17:30:10 -0000
> @@ -482,7 +482,6 @@ amap_wipeout(struct vm_amap *amap)
> int slot;
> struct vm_anon *anon;
> struct vm_amap_chunk *chunk;
> - struct pglist pgl;
> 
> KASSERT(rw_write_held(amap->am_lock));
> KASSERT(amap->am_ref == 0);
> @@ -495,7 +494,6 @@ amap_wipeout(struct vm_amap *amap)
> return;
> }
> 
> - TAILQ_INIT(&pgl);
> amap_list_remove(amap);
> 
> AMAP_CHUNK_FOREACH(chunk, amap) {
> @@ -515,12 +513,10 @@ amap_wipeout(struct vm_amap *amap)
>  */
> refs = --anon->an_ref;
> if (refs == 0) {
> - uvm_anfree_list(anon, &pgl);
> + uvm_anfree(anon);
> }
> }
> }
> - /* free the pages */
> - uvm_pglistfree(&pgl);
> 
> /*
>  * Finally, destroy the amap.
> Index: sys/uvm/uvm_anon.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_anon.c,v
> diff -u -p -r1.57 uvm_anon.c
> --- sys/uvm/uvm_anon.c 27 Oct 2023 19:13:51 -0000 1.57
> +++ sys/uvm/uvm_anon.c 30 Mar 2024 09:21:19 -0000
> @@ -116,7 +116,7 @@ uvm_anfree_list(struct vm_anon *anon, st
> uvm_unlock_pageq(); /* free the daemon */
> }
> } else {
> - if (anon->an_swslot != 0 && anon->an_swslot != SWSLOT_BAD) {
> + if (anon->an_swslot > 0) {
> /* This page is no longer only in swap. */
> KASSERT(uvmexp.swpgonly > 0);
> atomic_dec_int(&uvmexp.swpgonly);
> @@ -260,7 +260,8 @@ uvm_anon_release(struct vm_anon *anon)
> uvm_unlock_pageq();
> KASSERT(anon->an_page == NULL);
> lock = anon->an_lock;
> - uvm_anfree(anon);
> + uvm_anon_dropswap(anon);
> + pool_put(&uvm_anon_pool, anon);
> rw_exit(lock);
> /* Note: extra reference is held for PG_RELEASED case. */
> rw_obj_free(lock);
> Index: sys/uvm/uvm_page.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_page.c,v
> diff -u -p -r1.174 uvm_page.c
> --- sys/uvm/uvm_page.c 13 Feb 2024 10:16:28 -0000 1.174
> +++ sys/uvm/uvm_page.c 31 Mar 2024 12:16:46 -0000
> @@ -75,6 +75,7 @@
> #include <sys/smr.h>
> 
> #include <uvm/uvm.h>
> +#include <uvm/uvm_percpu.h>
> 
> /*
>  * for object trees
> @@ -120,6 +121,10 @@ static void uvm_pageinsert(struct vm_pag
> static void uvm_pageremove(struct vm_page *);
> int uvm_page_owner_locked_p(struct vm_page *);
> 
> +struct vm_page *uvm_pmr_getone(int);
> +struct vm_page *uvm_pmr_cache_get(int);
> +void uvm_pmr_cache_put(struct vm_page *);
> +
> /*
>  * inline functions
>  */
> @@ -877,13 +882,11 @@ uvm_pagerealloc_multi(struct uvm_object 
>  * => only one of obj or anon can be non-null
>  * => caller must activate/deactivate page if it is not wired.
>  */
> -
> struct vm_page *
> uvm_pagealloc(struct uvm_object *obj, voff_t off, struct vm_anon *anon,
>     int flags)
> {
> - struct vm_page *pg;
> - struct pglist pgl;
> + struct vm_page *pg = NULL;
> int pmr_flags;
> 
> KASSERT(obj == NULL || anon == NULL);
> @@ -906,13 +909,10 @@ uvm_pagealloc(struct uvm_object *obj, vo
> 
> if (flags & UVM_PGA_ZERO)
> pmr_flags |= UVM_PLA_ZERO;
> - TAILQ_INIT(&pgl);
> - if (uvm_pmr_getpages(1, 0, 0, 1, 0, 1, pmr_flags, &pgl) != 0)
> - goto fail;
> -
> - pg = TAILQ_FIRST(&pgl);
> - KASSERT(pg != NULL && TAILQ_NEXT(pg, pageq) == NULL);
> 
> + pg = uvm_pmr_cache_get(pmr_flags);
> + if (pg == NULL)
> + return NULL;
> uvm_pagealloc_pg(pg, obj, off, anon);
> KASSERT((pg->pg_flags & PG_DEV) == 0);
> if (flags & UVM_PGA_ZERO)
> @@ -921,9 +921,6 @@ uvm_pagealloc(struct uvm_object *obj, vo
> atomic_setbits_int(&pg->pg_flags, PG_CLEAN);
> 
> return pg;
> -
> -fail:
> - return NULL;
> }
> 
> /*
> @@ -1025,7 +1022,7 @@ void
> uvm_pagefree(struct vm_page *pg)
> {
> uvm_pageclean(pg);
> - uvm_pmr_freepages(pg, 1);
> + uvm_pmr_cache_put(pg);
> }
> 
> /*
> @@ -1398,3 +1395,153 @@ uvm_pagecount(struct uvm_constraint_rang
> }
> return sz;
> }
> +
> +struct vm_page *
> +uvm_pmr_getone(int flags)
> +{
> + struct vm_page *pg;
> + struct pglist pgl;
> +
> + TAILQ_INIT(&pgl);
> + if (uvm_pmr_getpages(1, 0, 0, 1, 0, 1, flags, &pgl) != 0)
> + return NULL;
> +
> + pg = TAILQ_FIRST(&pgl);
> + KASSERT(pg != NULL && TAILQ_NEXT(pg, pageq) == NULL);
> +
> + return pg;
> +}
> +
> +#if defined(MULTIPROCESSOR) && defined(__HAVE_UVM_PERCPU)
> +
> +/*
> + * Reload a magazine.
> + */
> +int
> +uvm_pmr_cache_alloc(struct uvm_pmr_cache_item *upci)
> +{
> + struct vm_page *pg;
> + struct pglist pgl;
> + int flags = UVM_PLA_NOWAIT|UVM_PLA_NOWAKE;
> + int npages = UVM_PMR_CACHEMAGSZ;
> +
> + KASSERT(upci->upci_npages == 0);
> +
> + TAILQ_INIT(&pgl);
> + if (uvm_pmr_getpages(npages, 0, 0, 1, 0, npages, flags, &pgl))
> + return -1;
> +
> + while ((pg = TAILQ_FIRST(&pgl)) != NULL) {
> + TAILQ_REMOVE(&pgl, pg, pageq);
> + upci->upci_pages[upci->upci_npages] = pg;
> + upci->upci_npages++;
> + }
> + atomic_add_int(&uvmexp.percpucaches, npages);
> +
> + return 0;
> +}
> +
> +struct vm_page *
> +uvm_pmr_cache_get(int flags)
> +{
> + struct uvm_pmr_cache *upc = &curcpu()->ci_uvm;
> + struct uvm_pmr_cache_item *upci;
> + struct vm_page *pg;
> +
> + upci = &upc->upc_magz[upc->upc_actv];
> + if (upci->upci_npages == 0) {
> + unsigned int prev;
> +
> + prev = (upc->upc_actv == 0) ?  1 : 0;
> + upci = &upc->upc_magz[prev];
> + if (upci->upci_npages == 0) {
> + atomic_inc_int(&uvmexp.pcpmiss);
> + if (uvm_pmr_cache_alloc(upci))
> + return uvm_pmr_getone(flags);
> + }
> + /* Swap magazines */
> + upc->upc_actv = prev;
> + } else {
> + atomic_inc_int(&uvmexp.pcphit);
> + }
> +
> + atomic_dec_int(&uvmexp.percpucaches);
> + upci->upci_npages--;
> + pg = upci->upci_pages[upci->upci_npages];
> +
> + if (flags & UVM_PLA_ZERO)
> + uvm_pagezero(pg);
> +
> + return pg;
> +}
> +
> +void
> +uvm_pmr_cache_free(struct uvm_pmr_cache_item *upci)
> +{
> + struct pglist pgl;
> + int i;
> +
> + TAILQ_INIT(&pgl);
> + for (i = 0; i < upci->upci_npages; i++)
> + TAILQ_INSERT_TAIL(&pgl, upci->upci_pages[i], pageq);
> +
> + uvm_pmr_freepageq(&pgl);
> +
> + atomic_sub_int(&uvmexp.percpucaches, upci->upci_npages);
> + upci->upci_npages = 0;
> + memset(upci->upci_pages, 0, sizeof(upci->upci_pages));
> +}
> +
> +void
> +uvm_pmr_cache_put(struct vm_page *pg)
> +{
> + struct uvm_pmr_cache *upc = &curcpu()->ci_uvm;
> + struct uvm_pmr_cache_item *upci;
> +
> + upci = &upc->upc_magz[upc->upc_actv];
> + if (upci->upci_npages >= UVM_PMR_CACHEMAGSZ) {
> + unsigned int prev;
> +
> + prev = (upc->upc_actv == 0) ?  1 : 0;
> + upci = &upc->upc_magz[prev];
> + if (upci->upci_npages > 0)
> + uvm_pmr_cache_free(upci);
> +
> + /* Swap magazines */
> + upc->upc_actv = prev;
> + KASSERT(upci->upci_npages == 0);
> + }
> +
> + upci->upci_pages[upci->upci_npages] = pg;
> + upci->upci_npages++;
> + atomic_inc_int(&uvmexp.percpucaches);
> +}
> +
> +void
> +uvm_pmr_cache_drain(void)
> +{
> + struct uvm_pmr_cache *upc = &curcpu()->ci_uvm;
> +
> + uvm_pmr_cache_free(&upc->upc_magz[0]);
> + uvm_pmr_cache_free(&upc->upc_magz[1]);
> +}
> +
> +#else /* !(MULTIPROCESSOR && __HAVE_UVM_PERCPU) */
> +
> +struct vm_page *
> +uvm_pmr_cache_get(int flags)
> +{
> + return uvm_pmr_getone(flags);
> +}
> +
> +void
> +uvm_pmr_cache_put(struct vm_page *pg)
> +{
> + uvm_pmr_freepages(pg, 1);
> +}
> +
> +void
> +uvm_pmr_cache_drain(void)
> +{
> +}
> +#endif /* MULTIPROCESSOR */
> Index: sys/uvm/uvm_pdaemon.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_pdaemon.c,v
> diff -u -p -r1.110 uvm_pdaemon.c
> --- sys/uvm/uvm_pdaemon.c 24 Mar 2024 10:29:35 -0000 1.110
> +++ sys/uvm/uvm_pdaemon.c 30 Mar 2024 12:53:39 -0000
> @@ -80,6 +80,7 @@
> #endif
> 
> #include <uvm/uvm.h>
> +#include <uvm/uvm_percpu.h>
> 
> #include "drm.h"
> 
> @@ -262,6 +263,8 @@ uvm_pageout(void *arg)
> #if NDRM > 0
> drmbackoff(size * 2);
> #endif
> + uvm_pmr_cache_drain();
> +
> /*
>  * scan if needed
>  */
> Index: sys/uvm/uvm_percpu.h
> ===================================================================
> RCS file: sys/uvm/uvm_percpu.h
> diff -N sys/uvm/uvm_percpu.h
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ sys/uvm/uvm_percpu.h 30 Mar 2024 12:54:47 -0000
> @@ -0,0 +1,45 @@
> +/* $OpenBSD$ */
> +
> +/*
> + * Copyright (c) 2024 Martin Pieuchot <mpi@openbsd.org>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#ifndef _UVM_UVM_PCPU_H_
> +#define _UVM_UVM_PCPU_H_
> +
> +/*
> + * We want a per-CPU cache size to be as small as possible and at the
> + * same time gets rid of the `uvm_lock_fpageq' contention.
> + */
> +#define UVM_PMR_CACHEMAGSZ 8 /* # of pages in a magazine */
> +
> +struct vm_page;
> +
> +/* Magazine */
> +struct uvm_pmr_cache_item {
> + struct vm_page *upci_pages[UVM_PMR_CACHEMAGSZ];
> + int  upci_npages; /* # of pages in magazine */
> +};
> +
> +/* Per-CPU cache */
> +struct uvm_pmr_cache {
> + struct uvm_pmr_cache_item upc_magz[2]; /* magazines */
> + int   upc_actv; /* index of active magazine */
> +
> +};
> +
> +void uvm_pmr_cache_drain(void);
> +
> +#endif /* _UVM_UVM_PCPU_H_ */
> Index: sys/uvm/uvmexp.h
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvmexp.h,v
> diff -u -p -r1.12 uvmexp.h
> --- sys/uvm/uvmexp.h 24 Mar 2024 10:29:35 -0000 1.12
> +++ sys/uvm/uvmexp.h 29 Mar 2024 21:04:16 -0000
> @@ -66,7 +66,7 @@ struct uvmexp {
> int zeropages; /* [F] number of zero'd pages */
> int reserve_pagedaemon; /* [I] # of pages reserved for pagedaemon */
> int reserve_kernel; /* [I] # of pages reserved for kernel */
> - int unused01; /* formerly anonpages */
> + int percpucaches; /* [a] # of pages in per-CPU caches */
> int vnodepages; /* XXX # of pages used by vnode page cache */
> int vtextpages; /* XXX # of pages used by vtext vnodes */
> 
> @@ -101,8 +101,8 @@ struct uvmexp {
> int syscalls; /* system calls */
> int pageins; /* [p] pagein operation count */
> /* pageouts are in pdpageouts below */
> - int unused07;           /* formerly obsolete_swapins */
> - int unused08;           /* formerly obsolete_swapouts */
> + int pcphit; /* [a] # of pagealloc from per-CPU cache */
> + int pcpmiss; /* [a] # of times a per-CPU cache was empty */
> int pgswapin; /* pages swapped in */
> int pgswapout; /* pages swapped out */
> int forks;   /* forks */
> Index: sys/arch/amd64/include/cpu.h
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
> diff -u -p -r1.163 cpu.h
> --- sys/arch/amd64/include/cpu.h 25 Feb 2024 19:15:50 -0000 1.163
> +++ sys/arch/amd64/include/cpu.h 30 Mar 2024 12:55:27 -0000
> @@ -53,6 +53,7 @@
> #include <sys/sched.h>
> #include <sys/sensors.h>
> #include <sys/srp.h>
> +#include <uvm/uvm_percpu.h>
> 
> #ifdef _KERNEL
> 
> @@ -201,6 +202,8 @@ struct cpu_info {
> 
> #ifdef MULTIPROCESSOR
> struct srp_hazard ci_srp_hazards[SRP_HAZARD_NUM];
> +#define __HAVE_UVM_PERCPU
> + struct uvm_pmr_cache ci_uvm; /* [o] page cache */
> #endif
> 
> struct ksensordev ci_sensordev;
> Index: sys/arch/arm64/include/cpu.h
> ===================================================================
> RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
> diff -u -p -r1.43 cpu.h
> --- sys/arch/arm64/include/cpu.h 25 Feb 2024 19:15:50 -0000 1.43
> +++ sys/arch/arm64/include/cpu.h 30 Mar 2024 12:55:55 -0000
> @@ -108,6 +108,7 @@ void arm32_vector_init(vaddr_t, int);
> #include <sys/device.h>
> #include <sys/sched.h>
> #include <sys/srp.h>
> +#include <uvm/uvm_percpu.h>
> 
> struct cpu_info {
> struct device *ci_dev; /* Device corresponding to this CPU */
> @@ -161,6 +162,8 @@ struct cpu_info {
> 
> #ifdef MULTIPROCESSOR
> struct srp_hazard ci_srp_hazards[SRP_HAZARD_NUM];
> +#define __HAVE_UVM_PERCPU
> + struct uvm_pmr_cache ci_uvm;
> volatile int ci_flags;
> 
> volatile int ci_ddb_paused;