Index | Thread | Search

From:
David Gwynne <david@gwynne.id.au>
Subject:
producer/consumer locking
To:
tech@openbsd.org
Date:
Sun, 4 May 2025 17:20:41 +1000

Download raw body.

Thread
this provides coordination between things producing and consuming
data when you don't want to block or delay the thing producing data,
but it's ok for make the consumer do more work to compensate for that.

the mechanism is a generalisation of the coordination used in the mp
counters api and the some of the process accounting code. data updated
by a producer is versioned, and the consumer reads the version on each
side of the critical section to see if it's been updated. if the
producer has updated the version, then the consumer has to retry.

the diff includes the migration of the process accounting to the
generalised api, and adds it to the cpu state counters on each cpu.
it's now possible to get a consistent snapshot of the cpu counters, even
if they were preempted by statclock.

i also have a pf diff that uses these. it allows pf to maintain counters
that userland can read without blocking the execution of pf. handy.

the only thing im worried about is the use of the alias function
attributes for !MULTIPROCESSOR kernels, but we use those in libc on all
our archs/compilers and it seems to be fine.

the manpage looks like this:

PC_LOCK_INIT(9)            Kernel Developer's Manual           PC_LOCK_INIT(9)

NAME
     pc_lock_init, pc_cons_enter, pc_cons_leave, pc_sprod_enter,
     pc_sprod_leave, pc_mprod_enter, pc_mprod_leave, PC_LOCK_INITIALIZER -
     producer/consumer locks

SYNOPSIS
     #include <sys/pclock.h>

     void
     pc_lock_init(struct pc_lock *pcl);

     void
     pc_cons_enter(struct pc_lock *pcl, unsigned int *genp);

     int
     pc_cons_leave(struct pc_lock *pcl, unsigned int *genp);

     unsigned int
     pc_sprod_enter(struct pc_lock *pcl);

     void
     pc_sprod_leave(struct pc_lock *pcl, unsigned int gen);

     unsigned int
     pc_mprod_enter(struct pc_lock *pcl);

     void
     pc_mprod_leave(struct pc_lock *pcl, unsigned int gen);

     PC_LOCK_INITIALIZER();

DESCRIPTION
     The producer/consumer lock functions provide mechanisms for a consumer to
     read data without blocking or delaying another CPU or an interrupt when
     it is updating or producing data.  A variant of the producer locking
     functions provides mutual exclusion between concurrent producers.

     This is implemented by having producers version the protected data with a
     generation number.  Consumers of the data compare the generation number
     at the start of the critical section to the generation number at the end,
     and must retry reading the data if the generation number has changed.

     The pc_lock_init() function is used to initialise the producer/consumer
     lock pointed to by pcl.

     A producer/consumer lock declaration may be initialised with the
     PC_LOCK_INITIALIZER() macro.

   Consumer API
     pc_cons_enter() reads the current generation number from pcl and stores
     it in the memory provided by the caller via genp.

     pc_cons_leave() compares the generation number in pcl with the value
     stored in genp by pc_cons_enter() at the start of the critical section,
     and returns whether the reads within the critical section need to be
     retried because the data has been updated by the producer.

   Single Producer API
     The single producer API is optimised for updating data from code

     pc_sprod_enter() marks the beginning of a single producer critical
     section for the pcl producer/consumer lock.

     pc_sprod_leave() marks the end of a single producer critical section for
     the pcl producer/consumer lock.  The gen argument must be the value
     returned from the preceding pc_sprod_enter() call.

   Multiple Producer API
     The multiple producer API provides mutual exclusion between multiple CPUs
     entering the critical section concurrently.  Unlike mtx_enter(9), the
     multiple producer does not prevent preemption by interrupts, it only
     provides mutual exclusion between CPUs.  If protection from preemption is
     required, splraise(9) can be used to protect the producer critical
     section.

     pc_mprod_enter() marks the beginning of a single producer critical
     section for the pcl producer/consumer lock.

     pc_mprod_leave() marks the end of a single producer critical section for
     the pcl producer/consumer lock.  The gen argument must be the value
     returned from the preceding pc_mprod_enter() call.

     On uniprocessor kernels the multiple producer API is aliased to the
     single producer API.

CONTEXT
     pc_lock_init(), pc_cons_enter(), pc_cons_leave(), pc_sprod_enter(),
     pc_sprod_leave(), pc_mprod_enter(), pc_mprod_leave(), can be called
     during autoconf, from process context, or from interrupt context.

     pc_sprod_enter(), pc_sprod_leave(), pc_mprod_enter(), and
     pc_mprod_leave() may run concurrently with (ie, on another CPU to) or
     preempt (ie, run at a higher interrupt level) than pc_cons_enter() and
     pc_cons_leave().

     pc_sprod_enter(), pc_sprod_leave(), pc_mprod_enter(), and
     pc_mprod_leave() must not be preempted or interrupted by the producer or
     consumer API for the same lock.

RETURN VALUES
     pc_cons_leave() returns 0 if the critical section did not overlap with an
     update from a producer, or non-zero if the critical section must be
     retried.

EXAMPLES
     To produce or update data:

           struct pc_lock pc = PC_LOCK_INITIALIZER();

           void
           producer(void)
           {
                   unsigned int gen;

                   gen = pc_sprod_enter(&pc);
                   /* update data */
                   pc_sprod_leave(&pc, gen);
           }

     A consistent read of the data from a consumer:

           void
           consumer(void)
           {
                   unsigned int gen;

                   pc_cons_enter(&pc, &gen);
                   do {
                           /* read data */
                   } while (pc_cons_leave(&pc, &gen) != 0);
           }

SEE ALSO
     mutex(9), splraise(9)

HISTORY
     The pc_lock_init functions first appeared in OpenBSD 7.8.

AUTHORS
     The pc_lock_init functions were written by David Gwynne
     <dlg@openbsd.org>.

CAVEATS
     Updates must be produced infrequently enough to allow time for consumers
     to be able to get a consistent read without looping too often.

     Because consuming the data may loop when retrying, care must be taken to
     avoid side effects from reading the data multiple times, eg, when
     accumulating values.

ok?

Index: share/man/man9/Makefile
===================================================================
RCS file: /cvs/src/share/man/man9/Makefile,v
diff -u -p -r1.310 Makefile
--- share/man/man9/Makefile	24 Feb 2024 16:21:32 -0000	1.310
+++ share/man/man9/Makefile	4 May 2025 07:18:11 -0000
@@ -29,7 +29,8 @@ MAN=	aml_evalnode.9 atomic_add_int.9 ato
 	malloc.9 membar_sync.9 memcmp.9 mbuf.9 mbuf_tags.9 md5.9 mi_switch.9 \
 	microtime.9 ml_init.9 mq_init.9 mutex.9 \
 	namei.9 \
-	panic.9 pci_conf_read.9 pci_mapreg_map.9 pci_intr_map.9 physio.9 \
+	panic.9 pci_conf_read.9 pci_mapreg_map.9 pci_intr_map.9 \
+	pc_lock_init.9 physio.9 \
 	pmap.9 pool.9 pool_cache_init.9 ppsratecheck.9 printf.9 psignal.9 \
 	RBT_INIT.9 \
 	radio.9 arc4random.9 rasops.9 ratecheck.9 refcnt_init.9 resettodr.9 \
Index: share/man/man9/pc_lock_init.9
===================================================================
RCS file: share/man/man9/pc_lock_init.9
diff -N share/man/man9/pc_lock_init.9
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ share/man/man9/pc_lock_init.9	4 May 2025 07:18:11 -0000
@@ -0,0 +1,212 @@
+.\" $OpenBSD$
+.\"
+.\" Copyright (c) 2025 David Gwynne <dlg@openbsd.org>
+.\" All rights reserved.
+.\"
+.\" Permission to use, copy, modify, and distribute this software for any
+.\" purpose with or without fee is hereby granted, provided that the above
+.\" copyright notice and this permission notice appear in all copies.
+.\"
+.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+.\"
+.Dd $Mdocdate: November 4 2019 $
+.Dt PC_LOCK_INIT 9
+.Os
+.Sh NAME
+.Nm pc_lock_init ,
+.Nm pc_cons_enter ,
+.Nm pc_cons_leave ,
+.Nm pc_sprod_enter ,
+.Nm pc_sprod_leave ,
+.Nm pc_mprod_enter ,
+.Nm pc_mprod_leave ,
+.Nm PC_LOCK_INITIALIZER
+.Nd producer/consumer locks
+.Sh SYNOPSIS
+.In sys/pclock.h
+.Ft void
+.Fn pc_lock_init "struct pc_lock *pcl"
+.Ft void
+.Fn pc_cons_enter "struct pc_lock *pcl" "unsigned int *genp"
+.Ft int
+.Fn pc_cons_leave "struct pc_lock *pcl" "unsigned int *genp"
+.Ft unsigned int
+.Fn pc_sprod_enter "struct pc_lock *pcl"
+.Ft void
+.Fn pc_sprod_leave "struct pc_lock *pcl" "unsigned int gen"
+.Ft unsigned int
+.Fn pc_mprod_enter "struct pc_lock *pcl"
+.Ft void
+.Fn pc_mprod_leave "struct pc_lock *pcl" "unsigned int gen"
+.Fn PC_LOCK_INITIALIZER
+.Sh DESCRIPTION
+The producer/consumer lock functions provide mechanisms for a
+consumer to read data without blocking or delaying another CPU or
+an interrupt when it is updating or producing data.
+A variant of the producer locking functions provides mutual exclusion
+between multiple producers.
+.Pp
+This is implemented by having producers version the protected data
+with a generation number.
+Consumers of the data compare the generation number at the start
+of the critical section to the generation number at the end, and
+must retry reading the data if the generation number has changed.
+.Pp
+The
+.Fn pc_lock_init
+function is used to initialise the producer/consumer lock pointed to by
+.Fa pcl .
+.Pp
+A producer/consumer lock declaration may be initialised with the
+.Fn PC_LOCK_INITIALIZER
+macro.
+.Ss Consumer API
+.Fn pc_cons_enter
+reads the current generation number from
+.Fa pcl
+and stores it in the memory provided by the caller via
+.Fa genp .
+.Pp
+.Fn pc_cons_leave
+compares the generation number in
+.Fa pcl
+with the value stored in
+.Fa genp
+by
+.Fn pc_cons_enter
+at the start of the critical section, and returns whether the reads
+within the critical section need to be retried because the data has
+been updated by the producer.
+.Ss Single Producer API
+The single producer API is optimised for updating data from code
+.Pp
+.Fn pc_sprod_enter
+marks the beginning of a single producer critical section for the
+.Fa pcl
+producer/consumer lock.
+.Pp
+.Fn pc_sprod_leave
+marks the end of a single producer critical section for the
+.Fa pcl
+producer/consumer lock.
+The
+.Fa gen
+argument must be the value returned from the preceding
+.Fn pc_sprod_enter
+call.
+.Ss Multiple Producer API
+The multiple producer API provides mutual exclusion between multiple
+CPUs entering the critical section concurrently.
+Unlike
+.Xr mtx_enter 9 ,
+the multiple producer does not prevent preemption by interrupts,
+it only provides mutual exclusion between CPUs.
+If protection from preemption is required,
+.Xr splraise 9
+can be used to protect the producer critical section.
+.Pp
+.Fn pc_mprod_enter
+marks the beginning of a single producer critical section for the
+.Fa pcl
+producer/consumer lock.
+.Pp
+.Fn pc_mprod_leave
+marks the end of a single producer critical section for the
+.Fa pcl
+producer/consumer lock.
+The
+.Fa gen
+argument must be the value returned from the preceding
+.Fn pc_mprod_enter
+call.
+.Pp
+On uniprocessor kernels the multiple producer API is aliased to the
+single producer API.
+.Sh CONTEXT
+.Fn pc_lock_init ,
+.Fn pc_cons_enter ,
+.Fn pc_cons_leave ,
+.Fn pc_sprod_enter ,
+.Fn pc_sprod_leave ,
+.Fn pc_mprod_enter ,
+.Fn pc_mprod_leave ,
+can be called during autoconf, from process context, or from interrupt context.
+.Pp
+.Fn pc_sprod_enter ,
+.Fn pc_sprod_leave ,
+.Fn pc_mprod_enter ,
+and
+.Fn pc_mprod_leave
+may run concurrently with (ie, on another CPU to)
+or preempt (ie, run at a higher interrupt level) than
+.Fn pc_cons_enter
+and
+.Fn pc_cons_leave .
+.Pp
+.Fn pc_sprod_enter ,
+.Fn pc_sprod_leave ,
+.Fn pc_mprod_enter ,
+and
+.Fn pc_mprod_leave
+must not be preempted or interrupted by the producer or consumer
+API for the same lock.
+.Sh RETURN VALUES
+.Fn pc_cons_leave
+returns 0 if the critical section did not overlap with an update
+from a producer, or non-zero if the critical section must be retried.
+.Sh EXAMPLES
+To produce or update data:
+.Bd -literal -offset indent
+struct pc_lock pc = PC_LOCK_INITIALIZER();
+
+void
+producer(void)
+{
+	unsigned int gen;
+
+	gen = pc_sprod_enter(&pc);
+	/* update data */
+	pc_sprod_leave(&pc, gen);
+}
+.Ed
+.Pp
+A consistent read of the data from a consumer:
+.Bd -literal -offset indent
+void
+consumer(void)
+{
+	unsigned int gen;
+
+	pc_cons_enter(&pc, &gen);
+	do {
+		/* read data */
+	} while (pc_cons_leave(&pc, &gen) != 0);
+}
+.Ed
+.Sh SEE ALSO
+.Xr mutex 9 ,
+.Xr splraise 9
+.Sh HISTORY
+The
+.Nm
+functions first appeared in
+.Ox 7.8 .
+.Sh AUTHORS
+The
+.Nm
+functions were written by
+.An David Gwynne Aq Mt dlg@openbsd.org .
+.Sh CAVEATS
+Updates must be produced infrequently enough to allow time for
+consumers to be able to get a consistent read without looping too
+often.
+.Pp
+Because consuming the data may loop when retrying, care must be
+taken to avoid side effects from reading the data multiple times,
+eg, when accumulating values.
Index: sys/kern/kern_clock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_clock.c,v
diff -u -p -r1.125 kern_clock.c
--- sys/kern/kern_clock.c	2 May 2025 05:04:38 -0000	1.125
+++ sys/kern/kern_clock.c	4 May 2025 07:18:11 -0000
@@ -270,6 +270,7 @@ statclock(struct clockrequest *cr, void 
 	struct process *pr;
 	int tu_tick = -1;
 	int cp_time;
+	unsigned int gen;
 
 	if (statclock_is_randomized) {
 		count = clockrequest_advance_random(cr, statclock_min,
@@ -313,7 +314,9 @@ statclock(struct clockrequest *cr, void 
 			cp_time = CP_SPIN;
 	}
 
+	gen = pc_sprod_enter(&spc->spc_cp_time_lock);
 	spc->spc_cp_time[cp_time] += count;
+	pc_sprod_leave(&spc->spc_cp_time_lock, gen);
 
 	if (p != NULL) {
 		p->p_cpticks += count;
@@ -322,7 +325,7 @@ statclock(struct clockrequest *cr, void 
 			struct vmspace *vm = p->p_vmspace;
 			struct tusage *tu = &p->p_tu;
 
-			tu_enter(tu);
+			gen = tu_enter(tu);
 			tu->tu_ticks[tu_tick] += count;
 
 			/* maxrss is handled by uvm */
@@ -334,7 +337,7 @@ statclock(struct clockrequest *cr, void 
 				tu->tu_isrss +=
 				    (vm->vm_ssize << (PAGE_SHIFT - 10)) * count;
 			}
-			tu_leave(tu);
+			tu_leave(tu, gen);
 		}
 
 		/*
Index: sys/kern/kern_exec.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_exec.c,v
diff -u -p -r1.262 kern_exec.c
--- sys/kern/kern_exec.c	17 Feb 2025 10:07:10 -0000	1.262
+++ sys/kern/kern_exec.c	4 May 2025 07:18:11 -0000
@@ -699,7 +699,7 @@ sys_execve(struct proc *p, void *v, regi
 	/* reset CPU time usage for the thread, but not the process */
 	timespecclear(&p->p_tu.tu_runtime);
 	p->p_tu.tu_uticks = p->p_tu.tu_sticks = p->p_tu.tu_iticks = 0;
-	p->p_tu.tu_gen = 0;
+	pc_lock_init(&p->p_tu.tu_pcl);
 
 	memset(p->p_name, 0, sizeof p->p_name);
 
Index: sys/kern/kern_lock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_lock.c,v
diff -u -p -r1.75 kern_lock.c
--- sys/kern/kern_lock.c	3 Jul 2024 01:36:50 -0000	1.75
+++ sys/kern/kern_lock.c	4 May 2025 07:18:11 -0000
@@ -24,6 +24,7 @@
 #include <sys/atomic.h>
 #include <sys/witness.h>
 #include <sys/mutex.h>
+#include <sys/pclock.h>
 
 #include <ddb/db_output.h>
 
@@ -418,3 +419,102 @@ _mtx_init_flags(struct mutex *m, int ipl
 	_mtx_init(m, ipl);
 }
 #endif /* WITNESS */
+
+void
+pc_lock_init(struct pc_lock *pcl)
+{
+	pcl->pcl_gen = 0;
+}
+
+unsigned int
+pc_sprod_enter(struct pc_lock *pcl)
+{
+	unsigned int gen;
+
+	gen = pcl->pcl_gen;
+	pcl->pcl_gen = ++gen;
+	membar_producer();
+
+	return (gen);
+}
+
+void
+pc_sprod_leave(struct pc_lock *pcl, unsigned int gen)
+{
+	membar_producer();
+	pcl->pcl_gen = ++gen;
+}
+
+#ifdef MULTIPROCESSOR
+unsigned int
+pc_mprod_enter(struct pc_lock *pcl)
+{
+	unsigned int gen, ngen, ogen;
+
+	gen = pcl->pcl_gen;
+	for (;;) {
+		while (gen & 1) {
+			CPU_BUSY_CYCLE();
+			gen = pcl->pcl_gen;
+		}
+
+		ngen = 1 + gen;
+		ogen = atomic_cas_uint(&pcl->pcl_gen, gen, ngen);
+		if (gen == ogen)
+			break;
+
+		CPU_BUSY_CYCLE();
+		gen = ogen;
+	}
+
+	membar_enter_after_atomic();
+	return (ngen);
+}
+
+void
+pc_mprod_leave(struct pc_lock *pcl, unsigned int gen)
+{
+	membar_exit();
+	pcl->pcl_gen = ++gen;
+}
+#else /* MULTIPROCESSOR */
+unsigned int	pc_mprod_enter(struct pc_lock *)
+		    __attribute__((alias("pc_sprod_enter")));
+void		pc_mprod_leave(struct pc_lock *, unsigned int)
+		    __attribute__((alias("pc_sprod_leave")));
+#endif /* MULTIPROCESSOR */
+
+void
+pc_cons_enter(struct pc_lock *pcl, unsigned int *genp)
+{
+	unsigned int gen;
+
+	gen = pcl->pcl_gen;
+	while (gen & 1) {
+		CPU_BUSY_CYCLE();
+		gen = pcl->pcl_gen;
+	}
+
+	membar_consumer();
+	*genp = gen;
+}
+
+int
+pc_cons_leave(struct pc_lock *pcl, unsigned int *genp)
+{
+	unsigned int gen;
+
+	membar_consumer();
+
+	gen = pcl->pcl_gen;
+	if (gen & 1) {
+		do {
+			CPU_BUSY_CYCLE();
+			gen = pcl->pcl_gen;
+		} while (gen & 1);
+	} else if (gen == *genp)
+		return (0);
+
+	*genp = gen;
+	return (EBUSY);
+}
Index: sys/kern/kern_resource.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_resource.c,v
diff -u -p -r1.94 kern_resource.c
--- sys/kern/kern_resource.c	2 May 2025 05:04:38 -0000	1.94
+++ sys/kern/kern_resource.c	4 May 2025 07:18:11 -0000
@@ -63,7 +63,7 @@ struct plimit	*lim_copy(struct plimit *)
 struct plimit	*lim_write_begin(void);
 void		 lim_write_commit(struct plimit *);
 
-void	tuagg_sumup(struct tusage *, const struct tusage *);
+void	tuagg_sumup(struct tusage *, struct tusage *);
 
 /*
  * Patchable maximum data and stack limits.
@@ -369,28 +369,15 @@ sys_getrlimit(struct proc *p, void *v, r
 
 /* Add the counts from *from to *tu, ensuring a consistent read of *from. */ 
 void
-tuagg_sumup(struct tusage *tu, const struct tusage *from)
+tuagg_sumup(struct tusage *tu, struct tusage *from)
 {
 	struct tusage	tmp;
-	uint64_t	enter, leave;
+	unsigned int	gen;
 
-	enter = from->tu_gen;
-	for (;;) {
-		/* the generation number is odd during an update */
-		while (enter & 1) {
-			CPU_BUSY_CYCLE();
-			enter = from->tu_gen;
-		}
-
-		membar_consumer();
+	pc_cons_enter(&from->tu_pcl, &gen);
+	do {
 		tmp = *from;
-		membar_consumer();
-		leave = from->tu_gen;
-
-		if (enter == leave)
-			break;
-		enter = leave;
-	}
+	} while (pc_cons_leave(&from->tu_pcl, &gen) != 0);
 
 	tu->tu_uticks += tmp.tu_uticks;
 	tu->tu_sticks += tmp.tu_sticks;
@@ -433,12 +420,14 @@ tuagg_get_process(struct tusage *tu, str
 void
 tuagg_add_process(struct process *pr, struct proc *p)
 {
+	unsigned int gen;
+
 	MUTEX_ASSERT_LOCKED(&pr->ps_mtx);
 	KASSERT(curproc == p || p->p_stat == SDEAD);
 
-	tu_enter(&pr->ps_tu);
+	gen = tu_enter(&pr->ps_tu);
 	tuagg_sumup(&pr->ps_tu, &p->p_tu);
-	tu_leave(&pr->ps_tu);
+	tu_leave(&pr->ps_tu, gen);
 
 	/* Now reset CPU time usage for the thread. */
 	timespecclear(&p->p_tu.tu_runtime);
@@ -452,6 +441,7 @@ tuagg_add_runtime(void)
 	struct schedstate_percpu *spc = &curcpu()->ci_schedstate;
 	struct proc *p = curproc;
 	struct timespec ts, delta;
+	unsigned int gen;
 
 	/*
 	 * Compute the amount of time during which the current
@@ -472,9 +462,9 @@ tuagg_add_runtime(void)
 	}
 	/* update spc_runtime */
 	spc->spc_runtime = ts;
-	tu_enter(&p->p_tu);
+	gen = tu_enter(&p->p_tu);
 	timespecadd(&p->p_tu.tu_runtime, &delta, &p->p_tu.tu_runtime);
-	tu_leave(&p->p_tu);
+	tu_leave(&p->p_tu, gen);
 }
 
 /*
Index: sys/kern/kern_sysctl.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
diff -u -p -r1.465 kern_sysctl.c
--- sys/kern/kern_sysctl.c	27 Apr 2025 00:58:55 -0000	1.465
+++ sys/kern/kern_sysctl.c	4 May 2025 07:18:11 -0000
@@ -172,6 +172,8 @@ int hw_sysctl_locked(int *, u_int, void 
 
 int (*cpu_cpuspeed)(int *);
 
+static void sysctl_ci_cp_time(struct cpu_info *, uint64_t *);
+
 /*
  * Lock to avoid too many processes vslocking a large amount of memory
  * at the same time.
@@ -682,11 +684,15 @@ kern_sysctl_locked(int *name, u_int name
 		memset(cp_time, 0, sizeof(cp_time));
 
 		CPU_INFO_FOREACH(cii, ci) {
+			uint64_t ci_cp_time[CPUSTATES];
+
 			if (!cpu_is_online(ci))
 				continue;
+
 			n++;
+			sysctl_ci_cp_time(ci, ci_cp_time);
 			for (i = 0; i < CPUSTATES; i++)
-				cp_time[i] += ci->ci_schedstate.spc_cp_time[i];
+				cp_time[i] += ci_cp_time[i];
 		}
 
 		for (i = 0; i < CPUSTATES; i++)
@@ -2793,12 +2799,27 @@ sysctl_sensors(int *name, u_int namelen,
 }
 #endif	/* SMALL_KERNEL */
 
+static void
+sysctl_ci_cp_time(struct cpu_info *ci, uint64_t *cp_time)
+{
+	struct schedstate_percpu *spc = &ci->ci_schedstate;
+	unsigned int gen;
+
+	pc_cons_enter(&spc->spc_cp_time_lock, &gen);
+	do {
+		int i;
+		for (i = 0; i < CPUSTATES; i++)
+			cp_time[i] = spc->spc_cp_time[i];
+	} while (pc_cons_leave(&spc->spc_cp_time_lock, &gen) != 0);
+}
+
 int
 sysctl_cptime2(int *name, u_int namelen, void *oldp, size_t *oldlenp,
     void *newp, size_t newlen)
 {
 	CPU_INFO_ITERATOR cii;
 	struct cpu_info *ci;
+	uint64_t cp_time[CPUSTATES];
 	int found = 0;
 
 	if (namelen != 1)
@@ -2813,9 +2834,10 @@ sysctl_cptime2(int *name, u_int namelen,
 	if (!found)
 		return (ENOENT);
 
+	sysctl_ci_cp_time(ci, cp_time);
+
 	return (sysctl_rdstruct(oldp, oldlenp, newp,
-	    &ci->ci_schedstate.spc_cp_time,
-	    sizeof(ci->ci_schedstate.spc_cp_time)));
+	    cp_time, sizeof(cp_time)));
 }
 
 #if NAUDIO > 0
@@ -2881,7 +2903,7 @@ sysctl_cpustats(int *name, u_int namelen
 		return (ENOENT);
 
 	memset(&cs, 0, sizeof cs);
-	memcpy(&cs.cs_time, &ci->ci_schedstate.spc_cp_time, sizeof(cs.cs_time));
+	sysctl_ci_cp_time(ci, cs.cs_time);
 	cs.cs_flags = 0;
 	if (cpu_is_online(ci))
 		cs.cs_flags |= CPUSTATS_ONLINE;
Index: sys/kern/sched_bsd.c
===================================================================
RCS file: /cvs/src/sys/kern/sched_bsd.c,v
diff -u -p -r1.99 sched_bsd.c
--- sys/kern/sched_bsd.c	10 Mar 2025 09:28:56 -0000	1.99
+++ sys/kern/sched_bsd.c	4 May 2025 07:18:11 -0000
@@ -585,6 +585,7 @@ setperf_auto(void *v)
 	CPU_INFO_ITERATOR cii;
 	struct cpu_info *ci;
 	uint64_t idle, total, allidle = 0, alltotal = 0;
+	unsigned int gen;
 
 	if (!perfpolicy_dynamic())
 		return;
@@ -609,14 +610,23 @@ setperf_auto(void *v)
 			return;
 		}
 	CPU_INFO_FOREACH(cii, ci) {
+		struct schedstate_percpu *spc;
+
 		if (!cpu_is_online(ci))
 			continue;
-		total = 0;
-		for (i = 0; i < CPUSTATES; i++) {
-			total += ci->ci_schedstate.spc_cp_time[i];
-		}
+
+		spc = &ci->ci_schedstate;
+		pc_cons_enter(&spc->spc_cp_time_lock, &gen);
+		do {
+			total = 0;
+			for (i = 0; i < CPUSTATES; i++) {
+				total += spc->spc_cp_time[i];
+			}
+			idle = spc->spc_cp_time[CP_IDLE];
+		} while (pc_cons_leave(&spc->spc_cp_time_lock, &gen) != 0);
+
 		total -= totalticks[j];
-		idle = ci->ci_schedstate.spc_cp_time[CP_IDLE] - idleticks[j];
+		idle -= idleticks[j];
 		if (idle < total / 3)
 			speedup = 1;
 		alltotal += total;
Index: sys/sys/pclock.h
===================================================================
RCS file: sys/sys/pclock.h
diff -N sys/sys/pclock.h
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ sys/sys/pclock.h	4 May 2025 07:18:11 -0000
@@ -0,0 +1,49 @@
+/*	$OpenBSD$ */
+
+/*
+ * Copyright (c) 2023 David Gwynne <dlg@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef _SYS_PCLOCK_H
+#define _SYS_PCLOCK_H
+
+#include <sys/_lock.h>
+
+struct pc_lock {
+	volatile unsigned int	 pcl_gen;
+};
+
+#ifdef _KERNEL
+
+#define PC_LOCK_INITIALIZER() { .pcl_gen = 0 }
+
+void		pc_lock_init(struct pc_lock *);
+
+/* single (non-interlocking) producer */
+unsigned int	pc_sprod_enter(struct pc_lock *);
+void		pc_sprod_leave(struct pc_lock *, unsigned int);
+
+/* multiple (interlocking) producers */
+unsigned int	pc_mprod_enter(struct pc_lock *);
+void		pc_mprod_leave(struct pc_lock *, unsigned int);
+
+/* consumer */
+void		pc_cons_enter(struct pc_lock *, unsigned int *);
+__warn_unused_result int
+		pc_cons_leave(struct pc_lock *, unsigned int *);
+
+#endif /* _KERNEL */
+
+#endif /* _SYS_PCLOCK_H */
Index: sys/sys/proc.h
===================================================================
RCS file: /cvs/src/sys/sys/proc.h,v
diff -u -p -r1.387 proc.h
--- sys/sys/proc.h	2 May 2025 05:04:38 -0000	1.387
+++ sys/sys/proc.h	4 May 2025 07:18:11 -0000
@@ -51,6 +51,7 @@
 #include <sys/rwlock.h>			/* For struct rwlock */
 #include <sys/sigio.h>			/* For struct sigio */
 #include <sys/refcnt.h>			/* For struct refcnt */
+#include <sys/pclock.h>
 
 #ifdef _KERNEL
 #include <sys/atomic.h>
@@ -91,8 +92,8 @@ struct	pgrp {
  * Each thread is immediately accumulated here. For processes only the
  * time of exited threads is accumulated and to get the proper process
  * time usage tuagg_get_process() needs to be called.
- * Accounting of threads is done lockless by curproc using the tu_gen
- * generation counter. Code should use tu_enter() and tu_leave() for this.
+ * Accounting of threads is done lockless by curproc using the tu_pcl
+ * pc_lock. Code should use tu_enter() and tu_leave() for this.
  * The process ps_tu structure is locked by the ps_mtx.
  */
 #define TU_UTICKS	0		/* Statclock hits in user mode. */
@@ -101,7 +102,7 @@ struct	pgrp {
 #define TU_TICKS_COUNT	3
 
 struct tusage {
-	uint64_t	tu_gen;		/* generation counter */
+	struct	pc_lock	tu_pcl;
 	uint64_t	tu_ticks[TU_TICKS_COUNT];
 #define tu_uticks	tu_ticks[TU_UTICKS]
 #define tu_sticks	tu_ticks[TU_STICKS]
@@ -125,8 +126,6 @@ struct tusage {
  * run-time information needed by threads.
  */
 #ifdef __need_process
-struct futex;
-LIST_HEAD(futex_list, futex);
 struct proc;
 struct tslpentry;
 TAILQ_HEAD(tslpqueue, tslpentry);
@@ -187,7 +186,6 @@ struct process {
 	struct	vmspace *ps_vmspace;	/* Address space */
 	pid_t	ps_pid;			/* [I] Process identifier. */
 
-	struct	futex_list ps_ftlist;	/* futexes attached to this process */
 	struct	tslpqueue ps_tslpqueue;	/* [p] queue of threads in thrsleep */
 	struct	rwlock	ps_lock;	/* per-process rwlock */
 	struct  mutex	ps_mtx;		/* per-process mutex */
@@ -353,9 +351,6 @@ struct proc {
 	struct	process *p_p;		/* [I] The process of this thread. */
 	TAILQ_ENTRY(proc) p_thr_link;	/* [K|m] Threads in a process linkage. */
 
-	TAILQ_ENTRY(proc) p_fut_link;	/* Threads in a futex linkage. */
-	struct	futex	*p_futex;	/* Current sleeping futex. */
-
 	/* substructures: */
 	struct	filedesc *p_fd;		/* copy of p_p->ps_fd */
 	struct	vmspace *p_vmspace;	/* [I] copy of p_p->ps_vmspace */
@@ -655,18 +650,16 @@ void cpuset_complement(struct cpuset *, 
 int cpuset_cardinality(struct cpuset *);
 struct cpu_info *cpuset_first(struct cpuset *);
 
-static inline void
+static inline unsigned int
 tu_enter(struct tusage *tu)
 {
-	++tu->tu_gen; /* make the generation number odd */
-	membar_producer();
+	return pc_sprod_enter(&tu->tu_pcl);
 }
 
 static inline void
-tu_leave(struct tusage *tu)
+tu_leave(struct tusage *tu, unsigned int gen)
 {
-	membar_producer();
-	++tu->tu_gen; /* make the generation number even again */
+	pc_sprod_leave(&tu->tu_pcl, gen);
 }
 
 #endif	/* _KERNEL */
Index: sys/sys/sched.h
===================================================================
RCS file: /cvs/src/sys/sys/sched.h,v
diff -u -p -r1.73 sched.h
--- sys/sys/sched.h	8 Jul 2024 14:46:47 -0000	1.73
+++ sys/sys/sched.h	4 May 2025 07:18:11 -0000
@@ -97,6 +97,7 @@ struct cpustats {
 
 #include <sys/clockintr.h>
 #include <sys/queue.h>
+#include <sys/pclock.h>
 
 #define	SCHED_NQS	32			/* 32 run queues. */
 
@@ -112,6 +113,7 @@ struct schedstate_percpu {
 	struct timespec spc_runtime;	/* time curproc started running */
 	volatile int spc_schedflags;	/* flags; see below */
 	u_int spc_schedticks;		/* ticks for schedclock() */
+	struct pc_lock spc_cp_time_lock;
 	u_int64_t spc_cp_time[CPUSTATES]; /* CPU state statistics */
 	u_char spc_curpriority;		/* usrpri of curproc */