From: Mark Kettenis <mark.kettenis@xs4all.nl>
Subject: Re: per-CPU page caches for page faults
To: Martin Pieuchot <mpi@openbsd.org>
Cc: tech@openbsd.org
Date: Mon, 18 Mar 2024 21:32:35 +0100

> Date: Mon, 18 Mar 2024 20:13:43 +0100
> From: Martin Pieuchot <mpi@openbsd.org>
> Content-Type: text/plain; charset=utf-8
> Content-Disposition: inline
> 
> Diff below attaches a 16 page array to the "struct cpuinfo" and uses it
> as a cache to reduce contention on the global pmemrange mutex.
> 
> Measured performance improvements are between 7% to 13% with 16 CPUs
> and 19% to 33% with 32 CPUs.  -current OpenBSD doesn't scale above 32
> CPUs so it wouldn't be fair to compare number of jobs spread across
> more CPUs.  However, as you can see below, this limitation is no longer
> true with this diff.
> 
> kernel
> ------
> 16:     1m47.93s real    11m24.18s user    10m55.78s system
> 32:     2m33.30s real    11m46.08s user    32m32.35s system (BC cold)
>         2m02.36s real    11m55.12s user    21m40.66s system
> 64:     2m00.72s real    11m59.59s user    25m47.63s system
> 
> libLLVM
> -------
> 16:     30m45.54s real   363m25.35s user   150m34.05s system
> 32:     24m29.88s real   409m49.80s user   311m02.54s system
> 64:     29m22.63s real   404m16.20s user   771m31.26s system
> 80:     30m12.49s real   398m07.01s user   816m01.71s system
> 
> kernel+percpucaches(16)
> ------
> 16:	1m30.17s real    11m19.29s user     6m42.08s system
> 32:     2m02.28s real    11m42.13s user    23m42.64s system (BC cold)
>         1m22.82s real    11m41.72s user     8m50.12s system
> 64:     1m23.47s real    11m56.99s user     9m42.00s system
> 80:     1m24.63s real    11m44.24s user    10m38.00s system
> 
> libLLVM+percpucaches(16)
> -------
> 16:     28m38.73s real   363m34.69s user    95m45.68s system
> 32:     19m57.71s real   415m17.23s user   174m47.83s system
> 64:     18m59.50s real   450m17.79s user   406m05.42s system
> 80:     19m02.26s real   452m35.11s user   473m09.05s system
> 
> Still the most important impact of this diff is the reduction of %sys
> time.  It drops from ~40% with 16 CPUs and ~55% with 32 CPUs or more.

Not much improvement on my 10-core M2 Pro system:

kernel
------
1m05.10s real     5m14.56s user     2m53.34s system

kernel+percpucaches(16)
------
1m03.92s real     5m06.53s user     2m42.14s system

but still an improvement.  Guess I should get more cores next time I
upgrade ;).

> What is the idea behind this diff?  With a consequent number of CPUs (16
> or more) grabbing a global mutex for every page allocation & free creates
> a lot of contention resulting in many CPU cycles wasted in system (kernel)
> time.  The idea of this diff is to add another layer on top of the global
> allocator to allocate and free pages in batch.  Note that, in this diff,
> this cache is only used for page faults.
> 
> The number of 16 has been chosen after careful testing on a 80 CPU Ampere
> machine.  It tried to keep it as small as possible while making sure that
> multiple parallel page faults on a large number of CPUs do not result in
> contention.  I'd argue that "stealing" at most 64k per CPU is acceptable
> on any MP system.

Yes, and you do drain the cache, so if we really need that memory,
most it will be available.

> The diff includes 3 new counters visible in "systat uvm" and "vmstat -s".
> 
> When the page daemon kicks in we drain the cache of the current CPU which
> is the best we can do without adding too much complexity.

There is an interesting trade-off to be made between bunching allocs
and frees and having free pages in the cache.  You opted to use 15 of
the 16 slots for bunching allocs/frees.  Did you explore a more even
split like 8/8?  Mostly just curious.

> I only tested amd64 and arm64, that's why there is such define in
> uvm/uvm_page.c.  I'd be happy to hear from tests on other architectures
> and different topologies.  You'll need to edit $arch/include/cpu.h and
> modify the define.

Instead of having the defined(__amd64__) || defined (__arm64__) this
should probably be a #define __HAVE_UVM_PCU inc <machine/cpu.h>.
Unless the goal is to convert all architectures that support
MULTIPROCESSOR kernels swiftly.

> This diff is really interesting because it now allows us to clearly see
> which syscall are contenting a lot.  Without surprise it's kbind(2),
> munmap(2) and mprotect(2).  It also shows which workloads are VFS-bound.
> That is what the "Buffer-Cache Cold" (BC Cold) numbers represent above.
> With a small number of CPUs we don't see much difference between the two.
> 
> Comments?

A few more down below.

> Index: usr.bin/systat/uvm.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/systat/uvm.c,v
> diff -u -p -r1.6 uvm.c
> --- usr.bin/systat/uvm.c	27 Nov 2022 23:18:54 -0000	1.6
> +++ usr.bin/systat/uvm.c	12 Mar 2024 16:53:11 -0000
> @@ -80,11 +80,10 @@ struct uvmline uvmline[] = {
>  	{ &uvmexp.zeropages, &last_uvmexp.zeropages, "zeropages",
>  	  &uvmexp.pageins, &last_uvmexp.pageins, "pageins",
>  	  &uvmexp.fltrelckok, &last_uvmexp.fltrelckok, "fltrelckok" },
> -	{ &uvmexp.reserve_pagedaemon, &last_uvmexp.reserve_pagedaemon,
> -	  "reserve_pagedaemon",
> +	{ &uvmexp.percpucaches, &last_uvmexp.percpucaches, "percpucaches",
>  	  &uvmexp.pgswapin, &last_uvmexp.pgswapin, "pgswapin",
>  	  &uvmexp.fltanget, &last_uvmexp.fltanget, "fltanget" },
> -	{ &uvmexp.reserve_kernel, &last_uvmexp.reserve_kernel, "reserve_kernel",
> +	{ NULL, NULL, NULL,
>  	  &uvmexp.pgswapout, &last_uvmexp.pgswapout, "pgswapout",
>  	  &uvmexp.fltanretry, &last_uvmexp.fltanretry, "fltanretry" },
>  	{ NULL, NULL, NULL,
> @@ -143,13 +142,13 @@ struct uvmline uvmline[] = {
>  	  NULL, NULL, NULL },
>  	{ &uvmexp.pagesize, &last_uvmexp.pagesize, "pagesize",
>  	  &uvmexp.pdpending, &last_uvmexp.pdpending, "pdpending",
> -	  NULL, NULL, NULL },
> +	  NULL, NULL, "Per-CPU Counters" },
>  	{ &uvmexp.pagemask, &last_uvmexp.pagemask, "pagemask",
>  	  &uvmexp.pddeact, &last_uvmexp.pddeact, "pddeact",
> -	  NULL, NULL, NULL },
> +	  &uvmexp.pcphit, &last_uvmexp.pcphit, "pcphit" },
>  	{ &uvmexp.pageshift, &last_uvmexp.pageshift, "pageshift",
>  	  NULL, NULL, NULL,
> -	  NULL, NULL, NULL }
> +	  &uvmexp.pcpmiss, &last_uvmexp.pcpmiss, "pcpmiss" }
>  };
>  
>  field_def fields_uvm[] = {
> Index: usr.bin/vmstat/vmstat.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/vmstat/vmstat.c,v
> diff -u -p -r1.155 vmstat.c
> --- usr.bin/vmstat/vmstat.c	4 Dec 2022 23:50:50 -0000	1.155
> +++ usr.bin/vmstat/vmstat.c	12 Mar 2024 16:49:56 -0000
> @@ -513,7 +513,12 @@ dosum(void)
>  		     uvmexp.reserve_pagedaemon);
>  	(void)printf("%11u pages reserved for kernel\n",
>  		     uvmexp.reserve_kernel);
> +	(void)printf("%11u pages in per-cpu caches\n",
> +		     uvmexp.percpucaches);
>  
> +	/* per-cpu cache */
> +	(void)printf("%11u per-cpu cache hits\n", uvmexp.pcphit);
> +	(void)printf("%11u per-cpu cache misses\n", uvmexp.pcpmiss);
>  	/* swap */
>  	(void)printf("%11u swap pages\n", uvmexp.swpages);
>  	(void)printf("%11u swap pages in use\n", uvmexp.swpginuse);
> Index: sys/uvm/uvm.h
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm.h,v
> diff -u -p -r1.71 uvm.h
> --- sys/uvm/uvm.h	7 Oct 2022 05:01:44 -0000	1.71
> +++ sys/uvm/uvm.h	6 Mar 2024 15:10:33 -0000
> @@ -47,6 +47,7 @@
>   *
>   *  Locks used to protect struct members in this file:
>   *	Q	uvm.pageqlock
> + *	F	uvm.fpageqlock
>   */
>  struct uvm {
>  	/* vm_page related parameters */
> @@ -58,7 +59,7 @@ struct uvm {
>  	struct mutex pageqlock;		/* [] lock for active/inactive page q */
>  	struct mutex fpageqlock;	/* [] lock for free page q  + pdaemon */
>  	boolean_t page_init_done;	/* TRUE if uvm_page_init() finished */
> -	struct uvm_pmr_control pmr_control; /* pmemrange data */
> +	struct uvm_pmr_control pmr_control; /* [F] pmemrange data */
>  
>  		/* page daemon trigger */
>  	int pagedaemon;			/* daemon sleeps on this */
> Index: sys/uvm/uvm_page.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_page.c,v
> diff -u -p -r1.174 uvm_page.c
> --- sys/uvm/uvm_page.c	13 Feb 2024 10:16:28 -0000	1.174
> +++ sys/uvm/uvm_page.c	18 Mar 2024 17:37:17 -0000
> @@ -75,6 +75,7 @@
>  #include <sys/smr.h>
>  
>  #include <uvm/uvm.h>
> +#include <uvm/uvm_percpu.h>
>  
>  /*
>   * for object trees
> @@ -119,6 +120,8 @@ static vaddr_t      virtual_space_end;
>  static void uvm_pageinsert(struct vm_page *);
>  static void uvm_pageremove(struct vm_page *);
>  int uvm_page_owner_locked_p(struct vm_page *);
> +struct vm_page *uvm_pcpu_getpage(int);
> +int uvm_pcpu_putpage(struct vm_page *);
>  
>  /*
>   * inline functions
> @@ -869,6 +872,7 @@ uvm_pagerealloc_multi(struct uvm_object 
>  	return r;
>  }
>  
> +
>  /*
>   * uvm_pagealloc: allocate vm_page from a particular free list.
>   *
> @@ -877,13 +881,11 @@ uvm_pagerealloc_multi(struct uvm_object 
>   * => only one of obj or anon can be non-null
>   * => caller must activate/deactivate page if it is not wired.
>   */
> -

Not sure what the purpose of moving a blank line is ;).

>  struct vm_page *
>  uvm_pagealloc(struct uvm_object *obj, voff_t off, struct vm_anon *anon,
>      int flags)
>  {
> -	struct vm_page *pg;
> -	struct pglist pgl;
> +	struct vm_page *pg = NULL;
>  	int pmr_flags;
>  
>  	KASSERT(obj == NULL || anon == NULL);
> @@ -906,12 +908,18 @@ uvm_pagealloc(struct uvm_object *obj, vo
>  
>  	if (flags & UVM_PGA_ZERO)
>  		pmr_flags |= UVM_PLA_ZERO;
> -	TAILQ_INIT(&pgl);
> -	if (uvm_pmr_getpages(1, 0, 0, 1, 0, 1, pmr_flags, &pgl) != 0)
> -		goto fail;
>  
> -	pg = TAILQ_FIRST(&pgl);
> -	KASSERT(pg != NULL && TAILQ_NEXT(pg, pageq) == NULL);
> +	pg = uvm_pcpu_getpage(pmr_flags);
> +	if (pg == NULL) {
> +		struct pglist pgl;
> +
> +		TAILQ_INIT(&pgl);
> +		if (uvm_pmr_getpages(1, 0, 0, 1, 0, 1, pmr_flags, &pgl) != 0)
> +			goto fail;
> +
> +		pg = TAILQ_FIRST(&pgl);
> +		KASSERT(pg != NULL && TAILQ_NEXT(pg, pageq) == NULL);
> +	}

Hmm, with the per-CPU caches enabled, retrying that uvm_pmr_getpages()
call probably isn't going to magically succeed when you try it again.

>  
>  	uvm_pagealloc_pg(pg, obj, off, anon);
>  	KASSERT((pg->pg_flags & PG_DEV) == 0);
> @@ -1025,7 +1033,8 @@ void
>  uvm_pagefree(struct vm_page *pg)
>  {
>  	uvm_pageclean(pg);
> -	uvm_pmr_freepages(pg, 1);
> +	if (uvm_pcpu_putpage(pg) == 0)
> +		uvm_pmr_freepages(pg, 1);
>  }

And, the way this is done, you take the hit of having both a
conditional and function call.  So I wonder if it makes sense to have
the no-cache uvm_pcpu_getpage() and uvm_pcpu_putpage() dummy
implementation do the alloc/free.  Probably micro-optimizations, but I
think it'll make the code more readable.

>  
>  /*
> @@ -1398,3 +1407,120 @@ uvm_pagecount(struct uvm_constraint_rang
>  	}
>  	return sz;
>  }
> +
> +#if defined(MULTIPROCESSOR) && (defined(__amd64__) || defined(__arm64__))
> +
> +/*
> + * uvm_pcpu_draincache: populate the current CPU cache
> + */

Wrong function name in the comment!  If you ask me, it is a bit silly
to ave the function name in the comment in the first place.  But
that's UVM for you.

> +int
> +uvm_pcpu_fillcache(void)
> +{
> +	struct uvm_percpu *upc = &curcpu()->ci_uvm;
> +	struct vm_page *pg;
> +	struct pglist pgl;
> +	int count;
> +
> +	/* Leave one free room to not drain the cache at the next free. */
> +	count = UVM_PCPU_MAXPAGES - upc->upc_count - 1;
> +
> +	TAILQ_INIT(&pgl);
> +	if (uvm_pmr_getpages(count, 0, 0, 1, 0, 1, UVM_PLA_NOWAIT, &pgl))
> +		return -1;
> +
> +	while ((pg = TAILQ_FIRST(&pgl)) != NULL) {
> +		TAILQ_REMOVE(&pgl, pg, pageq);
> +		upc->upc_pages[upc->upc_count] = pg;
> +		upc->upc_count++;
> +	}
> +	atomic_add_int(&uvmexp.percpucaches, count);
> +
> +	return 0;
> +}
> +
> +/*
> + * uvm_pcpu_getpage: allocate a page from the current CPU's cache
> + */
> +struct vm_page *
> +uvm_pcpu_getpage(int flags)
> +{
> +	struct uvm_percpu *upc = &curcpu()->ci_uvm;
> +	struct vm_page *pg;
> +
> +	if (upc->upc_count == 0) {
> +		atomic_inc_int(&uvmexp.pcpmiss);
> +		if (uvm_pcpu_fillcache())
> +			return NULL;
> +	} else {
> +		atomic_inc_int(&uvmexp.pcphit);
> +	}
> +
> +	atomic_dec_int(&uvmexp.percpucaches);
> +	upc->upc_count--;
> +	pg = upc->upc_pages[upc->upc_count];
> +
> +	if (flags & UVM_PLA_ZERO)
> +		uvm_pagezero(pg);
> +
> +	return pg;
> +}
> +
> +/*
> + * uvm_pcpu_draincache: empty the current CPU cache
> + */
> +void
> +uvm_pcpu_draincache(void)
> +{
> +	struct uvm_percpu *upc = &curcpu()->ci_uvm;
> +	struct pglist pgl;
> +	int i;
> +
> +	TAILQ_INIT(&pgl);
> +	for (i = 0; i < upc->upc_count; i++)
> +		TAILQ_INSERT_TAIL(&pgl, upc->upc_pages[i], pageq);
> +
> +	uvm_pmr_freepageq(&pgl);
> +
> +	atomic_sub_int(&uvmexp.percpucaches, upc->upc_count);
> +	upc->upc_count = 0;
> +	memset(upc->upc_pages, 0, sizeof(upc->upc_pages));
> +}
> +
> +/*
> + * uvm_pcpu_putpage: free a page and place it on the current CPU's cache
> + */
> +int
> +uvm_pcpu_putpage(struct vm_page *pg)
> +{
> +	struct uvm_percpu *upc = &curcpu()->ci_uvm;
> +
> +	if (upc->upc_count >= UVM_PCPU_MAXPAGES)
> +		uvm_pcpu_draincache();
> +
> +	upc->upc_pages[upc->upc_count] = pg;
> +	upc->upc_count++;
> +	atomic_inc_int(&uvmexp.percpucaches);
> +
> +	return 1;
> +}
> +
> +#else /* !MULTIPROCESSOR */
> +
> +struct vm_page *
> +uvm_pcpu_getpage(int flags)
> +{
> +	return NULL;
> +}
> +
> +void
> +uvm_pcpu_draincache(void)
> +{
> +}
> +
> +int
> +uvm_pcpu_putpage(struct vm_page *pg)
> +{
> +	return 0;
> +}
> +
> +#endif /* MULTIPROCESSOR */
> Index: sys/uvm/uvm_pmemrange.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_pmemrange.c,v
> diff -u -p -r1.63 uvm_pmemrange.c
> --- sys/uvm/uvm_pmemrange.c	10 Apr 2023 04:21:20 -0000	1.63
> +++ sys/uvm/uvm_pmemrange.c	11 Mar 2024 17:40:27 -0000
> @@ -1352,8 +1352,6 @@ uvm_pmr_freepageq(struct pglist *pgl)
>  	if (uvmexp.zeropages < UVM_PAGEZERO_TARGET)
>  		wakeup(&uvmexp.zeropages);
>  	uvm_unlock_fpageq();
> -
> -	return;
>  }
>  
>  /*
> Index: sys/uvm/uvmexp.h
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvmexp.h,v
> diff -u -p -r1.11 uvmexp.h
> --- sys/uvm/uvmexp.h	27 Oct 2023 19:18:53 -0000	1.11
> +++ sys/uvm/uvmexp.h	18 Mar 2024 17:35:24 -0000
> @@ -64,7 +64,7 @@ struct uvmexp {
>  	int zeropages;		/* [F] number of zero'd pages */
>  	int reserve_pagedaemon; /* [I] # of pages reserved for pagedaemon */
>  	int reserve_kernel;	/* [I] # of pages reserved for kernel */
> -	int unused01;		/* formerly anonpages */
> +	int percpucaches;	/* [a] # of pages in per-CPU caches */
>  	int vnodepages;		/* XXX # of pages used by vnode page cache */
>  	int vtextpages;		/* XXX # of pages used by vtext vnodes */
>  
> @@ -99,8 +99,8 @@ struct uvmexp {
>  	int syscalls;		/* system calls */
>  	int pageins;		/* pagein operation count */
>  				/* pageouts are in pdpageouts below */
> -	int unused07;		/* formerly obsolete_swapins */
> -	int unused08;		/* formerly obsolete_swapouts */
> +	int pcphit;		/* [a] # of pagealloc from per-CPU cache */
> +	int pcpmiss;		/* [a] # of times a per-CPU cache was empty */
>  	int pgswapin;		/* pages swapped in */
>  	int pgswapout;		/* pages swapped out */
>  	int forks;  		/* forks */
> Index: sys/uvm/uvm_pdaemon.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_pdaemon.c,v
> diff -u -p -r1.109 uvm_pdaemon.c
> --- sys/uvm/uvm_pdaemon.c	27 Oct 2023 19:18:53 -0000	1.109
> +++ sys/uvm/uvm_pdaemon.c	17 Mar 2024 17:54:02 -0000
> @@ -80,6 +80,7 @@
>  #endif
>  
>  #include <uvm/uvm.h>
> +#include <uvm/uvm_percpu.h>
>  
>  #include "drm.h"
>  
> @@ -276,6 +277,8 @@ uvm_pageout(void *arg)
>  #if NDRM > 0
>  		drmbackoff(size * 2);
>  #endif
> +		uvm_pcpu_draincache();
> +
>  		uvm_lock_pageq();
>  
>  		/*
> Index: sys/uvm/uvm_percpu.h
> ===================================================================
> RCS file: sys/uvm/uvm_percpu.h
> diff -N sys/uvm/uvm_percpu.h
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ sys/uvm/uvm_percpu.h	17 Mar 2024 17:48:49 -0000
> @@ -0,0 +1,43 @@
> +/*	$OpenBSD$	*/
> +
> +/*
> + * Copyright (c) 2024 Martin Pieuchot <mpi@openbsd.org>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#ifndef _UVM_UVM_PCPU_H_
> +#define _UVM_UVM_PCPU_H_
> +
> +/*
> + * We want a per-CPU cache size to be as small as possible and at the
> + * same time gets rid of the `uvm_lock_fpageq' contention.
> + */
> +#define UVM_PCPU_MAXPAGES 16
> +
> +struct vm_page;
> +
> +/*
> + * Per-CPU page cache
> + *
> + *  Locks used to protect struct members in this file:
> + *	o	owned (read/modified only) by its CPU
> + */
> +struct uvm_percpu {
> +	struct vm_page *upc_pages[UVM_PCPU_MAXPAGES];	/* [o] */
> +	int		upc_count;		/* [o] # of pages in cache */
> +};
> +
> +void uvm_pcpu_draincache(void);
> +
> +#endif /* _UVM_UVM_PCPU_H_ */
> Index: sys/arch/amd64/include/cpu.h
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/include/cpu.h,v
> diff -u -p -r1.163 cpu.h
> --- sys/arch/amd64/include/cpu.h	25 Feb 2024 19:15:50 -0000	1.163
> +++ sys/arch/amd64/include/cpu.h	10 Mar 2024 17:33:55 -0000
> @@ -53,6 +53,7 @@
>  #include <sys/sched.h>
>  #include <sys/sensors.h>
>  #include <sys/srp.h>
> +#include <uvm/uvm_percpu.h>
>  
>  #ifdef _KERNEL
>  
> @@ -201,6 +202,7 @@ struct cpu_info {
>  
>  #ifdef MULTIPROCESSOR
>  	struct srp_hazard	ci_srp_hazards[SRP_HAZARD_NUM];
> +	struct uvm_percpu	ci_uvm;		/* [o] page cache */
>  #endif
>  
>  	struct ksensordev	ci_sensordev;
> Index: sys/arch/arm64/include/cpu.h
> ===================================================================
> RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
> diff -u -p -r1.43 cpu.h
> --- sys/arch/arm64/include/cpu.h	25 Feb 2024 19:15:50 -0000	1.43
> +++ sys/arch/arm64/include/cpu.h	12 Mar 2024 16:23:37 -0000
> @@ -108,6 +108,7 @@ void	arm32_vector_init(vaddr_t, int);
>  #include <sys/device.h>
>  #include <sys/sched.h>
>  #include <sys/srp.h>
> +#include <uvm/uvm_percpu.h>
>  
>  struct cpu_info {
>  	struct device		*ci_dev; /* Device corresponding to this CPU */
> @@ -161,6 +162,7 @@ struct cpu_info {
>  
>  #ifdef MULTIPROCESSOR
>  	struct srp_hazard	ci_srp_hazards[SRP_HAZARD_NUM];
> +	struct uvm_percpu	ci_uvm;
>  	volatile int		ci_flags;
>  
>  	volatile int		ci_ddb_paused;
> 
>