Index | Thread | Search

From:
Vitaliy Makkoveev <otto@bsdbox.dev>
Subject:
Re: producer/consumer locking
To:
David Gwynne <david@gwynne.id.au>
Cc:
tech@openbsd.org
Date:
Sun, 4 May 2025 17:33:08 +0300

Download raw body.

Thread
Like this! I wanted to have something like mtx_enter_read()
for a long time.

> On 4 May 2025, at 10:20, David Gwynne <david@gwynne.id.au> wrote:
> 
> this provides coordination between things producing and consuming
> data when you don't want to block or delay the thing producing data,
> but it's ok for make the consumer do more work to compensate for that.
> 
> the mechanism is a generalisation of the coordination used in the mp
> counters api and the some of the process accounting code. data updated
> by a producer is versioned, and the consumer reads the version on each
> side of the critical section to see if it's been updated. if the
> producer has updated the version, then the consumer has to retry.
> 
> the diff includes the migration of the process accounting to the
> generalised api, and adds it to the cpu state counters on each cpu.
> it's now possible to get a consistent snapshot of the cpu counters, even
> if they were preempted by statclock.
> 
> i also have a pf diff that uses these. it allows pf to maintain counters
> that userland can read without blocking the execution of pf. handy.
> 
> the only thing im worried about is the use of the alias function
> attributes for !MULTIPROCESSOR kernels, but we use those in libc on all
> our archs/compilers and it seems to be fine.
> 
> the manpage looks like this:
> 
> PC_LOCK_INIT(9)            Kernel Developer's Manual           PC_LOCK_INIT(9)
> 
> NAME
>     pc_lock_init, pc_cons_enter, pc_cons_leave, pc_sprod_enter,
>     pc_sprod_leave, pc_mprod_enter, pc_mprod_leave, PC_LOCK_INITIALIZER -
>     producer/consumer locks
> 
> SYNOPSIS
>     #include <sys/pclock.h>
> 
>     void
>     pc_lock_init(struct pc_lock *pcl);
> 
>     void
>     pc_cons_enter(struct pc_lock *pcl, unsigned int *genp);
> 
>     int
>     pc_cons_leave(struct pc_lock *pcl, unsigned int *genp);
> 
>     unsigned int
>     pc_sprod_enter(struct pc_lock *pcl);
> 
>     void
>     pc_sprod_leave(struct pc_lock *pcl, unsigned int gen);
> 
>     unsigned int
>     pc_mprod_enter(struct pc_lock *pcl);
> 
>     void
>     pc_mprod_leave(struct pc_lock *pcl, unsigned int gen);
> 
>     PC_LOCK_INITIALIZER();
> 
> DESCRIPTION
>     The producer/consumer lock functions provide mechanisms for a consumer to
>     read data without blocking or delaying another CPU or an interrupt when
>     it is updating or producing data.  A variant of the producer locking
>     functions provides mutual exclusion between concurrent producers.
> 
>     This is implemented by having producers version the protected data with a
>     generation number.  Consumers of the data compare the generation number
>     at the start of the critical section to the generation number at the end,
>     and must retry reading the data if the generation number has changed.
> 
>     The pc_lock_init() function is used to initialise the producer/consumer
>     lock pointed to by pcl.
> 
>     A producer/consumer lock declaration may be initialised with the
>     PC_LOCK_INITIALIZER() macro.
> 
>   Consumer API
>     pc_cons_enter() reads the current generation number from pcl and stores
>     it in the memory provided by the caller via genp.
> 
>     pc_cons_leave() compares the generation number in pcl with the value
>     stored in genp by pc_cons_enter() at the start of the critical section,
>     and returns whether the reads within the critical section need to be
>     retried because the data has been updated by the producer.
> 
>   Single Producer API
>     The single producer API is optimised for updating data from code
> 
>     pc_sprod_enter() marks the beginning of a single producer critical
>     section for the pcl producer/consumer lock.
> 
>     pc_sprod_leave() marks the end of a single producer critical section for
>     the pcl producer/consumer lock.  The gen argument must be the value
>     returned from the preceding pc_sprod_enter() call.
> 
>   Multiple Producer API
>     The multiple producer API provides mutual exclusion between multiple CPUs
>     entering the critical section concurrently.  Unlike mtx_enter(9), the
>     multiple producer does not prevent preemption by interrupts, it only
>     provides mutual exclusion between CPUs.  If protection from preemption is
>     required, splraise(9) can be used to protect the producer critical
>     section.
> 
>     pc_mprod_enter() marks the beginning of a single producer critical
>     section for the pcl producer/consumer lock.
> 
>     pc_mprod_leave() marks the end of a single producer critical section for
>     the pcl producer/consumer lock.  The gen argument must be the value
>     returned from the preceding pc_mprod_enter() call.
> 
>     On uniprocessor kernels the multiple producer API is aliased to the
>     single producer API.
> 
> CONTEXT
>     pc_lock_init(), pc_cons_enter(), pc_cons_leave(), pc_sprod_enter(),
>     pc_sprod_leave(), pc_mprod_enter(), pc_mprod_leave(), can be called
>     during autoconf, from process context, or from interrupt context.
> 
>     pc_sprod_enter(), pc_sprod_leave(), pc_mprod_enter(), and
>     pc_mprod_leave() may run concurrently with (ie, on another CPU to) or
>     preempt (ie, run at a higher interrupt level) than pc_cons_enter() and
>     pc_cons_leave().
> 
>     pc_sprod_enter(), pc_sprod_leave(), pc_mprod_enter(), and
>     pc_mprod_leave() must not be preempted or interrupted by the producer or
>     consumer API for the same lock.
> 
> RETURN VALUES
>     pc_cons_leave() returns 0 if the critical section did not overlap with an
>     update from a producer, or non-zero if the critical section must be
>     retried.
> 
> EXAMPLES
>     To produce or update data:
> 
>           struct pc_lock pc = PC_LOCK_INITIALIZER();
> 
>           void
>           producer(void)
>           {
>                   unsigned int gen;
> 
>                   gen = pc_sprod_enter(&pc);
>                   /* update data */
>                   pc_sprod_leave(&pc, gen);
>           }
> 
>     A consistent read of the data from a consumer:
> 
>           void
>           consumer(void)
>           {
>                   unsigned int gen;
> 
>                   pc_cons_enter(&pc, &gen);
>                   do {
>                           /* read data */
>                   } while (pc_cons_leave(&pc, &gen) != 0);
>           }
> 
> SEE ALSO
>     mutex(9), splraise(9)
> 
> HISTORY
>     The pc_lock_init functions first appeared in OpenBSD 7.8.
> 
> AUTHORS
>     The pc_lock_init functions were written by David Gwynne
>     <dlg@openbsd.org>.
> 
> CAVEATS
>     Updates must be produced infrequently enough to allow time for consumers
>     to be able to get a consistent read without looping too often.
> 
>     Because consuming the data may loop when retrying, care must be taken to
>     avoid side effects from reading the data multiple times, eg, when
>     accumulating values.
> 
> ok?
> 
> Index: share/man/man9/Makefile
> ===================================================================
> RCS file: /cvs/src/share/man/man9/Makefile,v
> diff -u -p -r1.310 Makefile
> --- share/man/man9/Makefile	24 Feb 2024 16:21:32 -0000	1.310
> +++ share/man/man9/Makefile	4 May 2025 07:18:11 -0000
> @@ -29,7 +29,8 @@ MAN=	aml_evalnode.9 atomic_add_int.9 ato
> 	malloc.9 membar_sync.9 memcmp.9 mbuf.9 mbuf_tags.9 md5.9 mi_switch.9 \
> 	microtime.9 ml_init.9 mq_init.9 mutex.9 \
> 	namei.9 \
> -	panic.9 pci_conf_read.9 pci_mapreg_map.9 pci_intr_map.9 physio.9 \
> +	panic.9 pci_conf_read.9 pci_mapreg_map.9 pci_intr_map.9 \
> +	pc_lock_init.9 physio.9 \
> 	pmap.9 pool.9 pool_cache_init.9 ppsratecheck.9 printf.9 psignal.9 \
> 	RBT_INIT.9 \
> 	radio.9 arc4random.9 rasops.9 ratecheck.9 refcnt_init.9 resettodr.9 \
> Index: share/man/man9/pc_lock_init.9
> ===================================================================
> RCS file: share/man/man9/pc_lock_init.9
> diff -N share/man/man9/pc_lock_init.9
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ share/man/man9/pc_lock_init.9	4 May 2025 07:18:11 -0000
> @@ -0,0 +1,212 @@
> +.\" $OpenBSD$
> +.\"
> +.\" Copyright (c) 2025 David Gwynne <dlg@openbsd.org>
> +.\" All rights reserved.
> +.\"
> +.\" Permission to use, copy, modify, and distribute this software for any
> +.\" purpose with or without fee is hereby granted, provided that the above
> +.\" copyright notice and this permission notice appear in all copies.
> +.\"
> +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> +.\"
> +.Dd $Mdocdate: November 4 2019 $
> +.Dt PC_LOCK_INIT 9
> +.Os
> +.Sh NAME
> +.Nm pc_lock_init ,
> +.Nm pc_cons_enter ,
> +.Nm pc_cons_leave ,
> +.Nm pc_sprod_enter ,
> +.Nm pc_sprod_leave ,
> +.Nm pc_mprod_enter ,
> +.Nm pc_mprod_leave ,
> +.Nm PC_LOCK_INITIALIZER
> +.Nd producer/consumer locks
> +.Sh SYNOPSIS
> +.In sys/pclock.h
> +.Ft void
> +.Fn pc_lock_init "struct pc_lock *pcl"
> +.Ft void
> +.Fn pc_cons_enter "struct pc_lock *pcl" "unsigned int *genp"
> +.Ft int
> +.Fn pc_cons_leave "struct pc_lock *pcl" "unsigned int *genp"
> +.Ft unsigned int
> +.Fn pc_sprod_enter "struct pc_lock *pcl"
> +.Ft void
> +.Fn pc_sprod_leave "struct pc_lock *pcl" "unsigned int gen"
> +.Ft unsigned int
> +.Fn pc_mprod_enter "struct pc_lock *pcl"
> +.Ft void
> +.Fn pc_mprod_leave "struct pc_lock *pcl" "unsigned int gen"
> +.Fn PC_LOCK_INITIALIZER
> +.Sh DESCRIPTION
> +The producer/consumer lock functions provide mechanisms for a
> +consumer to read data without blocking or delaying another CPU or
> +an interrupt when it is updating or producing data.
> +A variant of the producer locking functions provides mutual exclusion
> +between multiple producers.
> +.Pp
> +This is implemented by having producers version the protected data
> +with a generation number.
> +Consumers of the data compare the generation number at the start
> +of the critical section to the generation number at the end, and
> +must retry reading the data if the generation number has changed.
> +.Pp
> +The
> +.Fn pc_lock_init
> +function is used to initialise the producer/consumer lock pointed to by
> +.Fa pcl .
> +.Pp
> +A producer/consumer lock declaration may be initialised with the
> +.Fn PC_LOCK_INITIALIZER
> +macro.
> +.Ss Consumer API
> +.Fn pc_cons_enter
> +reads the current generation number from
> +.Fa pcl
> +and stores it in the memory provided by the caller via
> +.Fa genp .
> +.Pp
> +.Fn pc_cons_leave
> +compares the generation number in
> +.Fa pcl
> +with the value stored in
> +.Fa genp
> +by
> +.Fn pc_cons_enter
> +at the start of the critical section, and returns whether the reads
> +within the critical section need to be retried because the data has
> +been updated by the producer.
> +.Ss Single Producer API
> +The single producer API is optimised for updating data from code
> +.Pp
> +.Fn pc_sprod_enter
> +marks the beginning of a single producer critical section for the
> +.Fa pcl
> +producer/consumer lock.
> +.Pp
> +.Fn pc_sprod_leave
> +marks the end of a single producer critical section for the
> +.Fa pcl
> +producer/consumer lock.
> +The
> +.Fa gen
> +argument must be the value returned from the preceding
> +.Fn pc_sprod_enter
> +call.
> +.Ss Multiple Producer API
> +The multiple producer API provides mutual exclusion between multiple
> +CPUs entering the critical section concurrently.
> +Unlike
> +.Xr mtx_enter 9 ,
> +the multiple producer does not prevent preemption by interrupts,
> +it only provides mutual exclusion between CPUs.
> +If protection from preemption is required,
> +.Xr splraise 9
> +can be used to protect the producer critical section.
> +.Pp
> +.Fn pc_mprod_enter
> +marks the beginning of a single producer critical section for the
> +.Fa pcl
> +producer/consumer lock.
> +.Pp
> +.Fn pc_mprod_leave
> +marks the end of a single producer critical section for the
> +.Fa pcl
> +producer/consumer lock.
> +The
> +.Fa gen
> +argument must be the value returned from the preceding
> +.Fn pc_mprod_enter
> +call.
> +.Pp
> +On uniprocessor kernels the multiple producer API is aliased to the
> +single producer API.
> +.Sh CONTEXT
> +.Fn pc_lock_init ,
> +.Fn pc_cons_enter ,
> +.Fn pc_cons_leave ,
> +.Fn pc_sprod_enter ,
> +.Fn pc_sprod_leave ,
> +.Fn pc_mprod_enter ,
> +.Fn pc_mprod_leave ,
> +can be called during autoconf, from process context, or from interrupt context.
> +.Pp
> +.Fn pc_sprod_enter ,
> +.Fn pc_sprod_leave ,
> +.Fn pc_mprod_enter ,
> +and
> +.Fn pc_mprod_leave
> +may run concurrently with (ie, on another CPU to)
> +or preempt (ie, run at a higher interrupt level) than
> +.Fn pc_cons_enter
> +and
> +.Fn pc_cons_leave .
> +.Pp
> +.Fn pc_sprod_enter ,
> +.Fn pc_sprod_leave ,
> +.Fn pc_mprod_enter ,
> +and
> +.Fn pc_mprod_leave
> +must not be preempted or interrupted by the producer or consumer
> +API for the same lock.
> +.Sh RETURN VALUES
> +.Fn pc_cons_leave
> +returns 0 if the critical section did not overlap with an update
> +from a producer, or non-zero if the critical section must be retried.
> +.Sh EXAMPLES
> +To produce or update data:
> +.Bd -literal -offset indent
> +struct pc_lock pc = PC_LOCK_INITIALIZER();
> +
> +void
> +producer(void)
> +{
> +	unsigned int gen;
> +
> +	gen = pc_sprod_enter(&pc);
> +	/* update data */
> +	pc_sprod_leave(&pc, gen);
> +}
> +.Ed
> +.Pp
> +A consistent read of the data from a consumer:
> +.Bd -literal -offset indent
> +void
> +consumer(void)
> +{
> +	unsigned int gen;
> +
> +	pc_cons_enter(&pc, &gen);
> +	do {
> +		/* read data */
> +	} while (pc_cons_leave(&pc, &gen) != 0);
> +}
> +.Ed
> +.Sh SEE ALSO
> +.Xr mutex 9 ,
> +.Xr splraise 9
> +.Sh HISTORY
> +The
> +.Nm
> +functions first appeared in
> +.Ox 7.8 .
> +.Sh AUTHORS
> +The
> +.Nm
> +functions were written by
> +.An David Gwynne Aq Mt dlg@openbsd.org .
> +.Sh CAVEATS
> +Updates must be produced infrequently enough to allow time for
> +consumers to be able to get a consistent read without looping too
> +often.
> +.Pp
> +Because consuming the data may loop when retrying, care must be
> +taken to avoid side effects from reading the data multiple times,
> +eg, when accumulating values.
> Index: sys/kern/kern_clock.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_clock.c,v
> diff -u -p -r1.125 kern_clock.c
> --- sys/kern/kern_clock.c	2 May 2025 05:04:38 -0000	1.125
> +++ sys/kern/kern_clock.c	4 May 2025 07:18:11 -0000
> @@ -270,6 +270,7 @@ statclock(struct clockrequest *cr, void 
> 	struct process *pr;
> 	int tu_tick = -1;
> 	int cp_time;
> +	unsigned int gen;
> 
> 	if (statclock_is_randomized) {
> 		count = clockrequest_advance_random(cr, statclock_min,
> @@ -313,7 +314,9 @@ statclock(struct clockrequest *cr, void 
> 			cp_time = CP_SPIN;
> 	}
> 
> +	gen = pc_sprod_enter(&spc->spc_cp_time_lock);
> 	spc->spc_cp_time[cp_time] += count;
> +	pc_sprod_leave(&spc->spc_cp_time_lock, gen);
> 
> 	if (p != NULL) {
> 		p->p_cpticks += count;
> @@ -322,7 +325,7 @@ statclock(struct clockrequest *cr, void 
> 			struct vmspace *vm = p->p_vmspace;
> 			struct tusage *tu = &p->p_tu;
> 
> -			tu_enter(tu);
> +			gen = tu_enter(tu);
> 			tu->tu_ticks[tu_tick] += count;
> 
> 			/* maxrss is handled by uvm */
> @@ -334,7 +337,7 @@ statclock(struct clockrequest *cr, void 
> 				tu->tu_isrss +=
> 				    (vm->vm_ssize << (PAGE_SHIFT - 10)) * count;
> 			}
> -			tu_leave(tu);
> +			tu_leave(tu, gen);
> 		}
> 
> 		/*
> Index: sys/kern/kern_exec.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_exec.c,v
> diff -u -p -r1.262 kern_exec.c
> --- sys/kern/kern_exec.c	17 Feb 2025 10:07:10 -0000	1.262
> +++ sys/kern/kern_exec.c	4 May 2025 07:18:11 -0000
> @@ -699,7 +699,7 @@ sys_execve(struct proc *p, void *v, regi
> 	/* reset CPU time usage for the thread, but not the process */
> 	timespecclear(&p->p_tu.tu_runtime);
> 	p->p_tu.tu_uticks = p->p_tu.tu_sticks = p->p_tu.tu_iticks = 0;
> -	p->p_tu.tu_gen = 0;
> +	pc_lock_init(&p->p_tu.tu_pcl);
> 
> 	memset(p->p_name, 0, sizeof p->p_name);
> 
> Index: sys/kern/kern_lock.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_lock.c,v
> diff -u -p -r1.75 kern_lock.c
> --- sys/kern/kern_lock.c	3 Jul 2024 01:36:50 -0000	1.75
> +++ sys/kern/kern_lock.c	4 May 2025 07:18:11 -0000
> @@ -24,6 +24,7 @@
> #include <sys/atomic.h>
> #include <sys/witness.h>
> #include <sys/mutex.h>
> +#include <sys/pclock.h>
> 
> #include <ddb/db_output.h>
> 
> @@ -418,3 +419,102 @@ _mtx_init_flags(struct mutex *m, int ipl
> 	_mtx_init(m, ipl);
> }
> #endif /* WITNESS */
> +
> +void
> +pc_lock_init(struct pc_lock *pcl)
> +{
> +	pcl->pcl_gen = 0;
> +}
> +
> +unsigned int
> +pc_sprod_enter(struct pc_lock *pcl)
> +{
> +	unsigned int gen;
> +
> +	gen = pcl->pcl_gen;
> +	pcl->pcl_gen = ++gen;
> +	membar_producer();
> +
> +	return (gen);
> +}
> +
> +void
> +pc_sprod_leave(struct pc_lock *pcl, unsigned int gen)
> +{
> +	membar_producer();
> +	pcl->pcl_gen = ++gen;
> +}
> +
> +#ifdef MULTIPROCESSOR
> +unsigned int
> +pc_mprod_enter(struct pc_lock *pcl)
> +{
> +	unsigned int gen, ngen, ogen;
> +
> +	gen = pcl->pcl_gen;
> +	for (;;) {
> +		while (gen & 1) {
> +			CPU_BUSY_CYCLE();
> +			gen = pcl->pcl_gen;
> +		}
> +
> +		ngen = 1 + gen;
> +		ogen = atomic_cas_uint(&pcl->pcl_gen, gen, ngen);
> +		if (gen == ogen)
> +			break;
> +
> +		CPU_BUSY_CYCLE();
> +		gen = ogen;
> +	}
> +
> +	membar_enter_after_atomic();
> +	return (ngen);
> +}
> +
> +void
> +pc_mprod_leave(struct pc_lock *pcl, unsigned int gen)
> +{
> +	membar_exit();
> +	pcl->pcl_gen = ++gen;
> +}
> +#else /* MULTIPROCESSOR */
> +unsigned int	pc_mprod_enter(struct pc_lock *)
> +		    __attribute__((alias("pc_sprod_enter")));
> +void		pc_mprod_leave(struct pc_lock *, unsigned int)
> +		    __attribute__((alias("pc_sprod_leave")));
> +#endif /* MULTIPROCESSOR */
> +
> +void
> +pc_cons_enter(struct pc_lock *pcl, unsigned int *genp)
> +{
> +	unsigned int gen;
> +
> +	gen = pcl->pcl_gen;
> +	while (gen & 1) {
> +		CPU_BUSY_CYCLE();
> +		gen = pcl->pcl_gen;
> +	}
> +
> +	membar_consumer();
> +	*genp = gen;
> +}
> +
> +int
> +pc_cons_leave(struct pc_lock *pcl, unsigned int *genp)
> +{
> +	unsigned int gen;
> +
> +	membar_consumer();
> +
> +	gen = pcl->pcl_gen;
> +	if (gen & 1) {
> +		do {
> +			CPU_BUSY_CYCLE();
> +			gen = pcl->pcl_gen;
> +		} while (gen & 1);
> +	} else if (gen == *genp)
> +		return (0);
> +
> +	*genp = gen;
> +	return (EBUSY);
> +}
> Index: sys/kern/kern_resource.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_resource.c,v
> diff -u -p -r1.94 kern_resource.c
> --- sys/kern/kern_resource.c	2 May 2025 05:04:38 -0000	1.94
> +++ sys/kern/kern_resource.c	4 May 2025 07:18:11 -0000
> @@ -63,7 +63,7 @@ struct plimit	*lim_copy(struct plimit *)
> struct plimit	*lim_write_begin(void);
> void		 lim_write_commit(struct plimit *);
> 
> -void	tuagg_sumup(struct tusage *, const struct tusage *);
> +void	tuagg_sumup(struct tusage *, struct tusage *);
> 
> /*
>  * Patchable maximum data and stack limits.
> @@ -369,28 +369,15 @@ sys_getrlimit(struct proc *p, void *v, r
> 
> /* Add the counts from *from to *tu, ensuring a consistent read of *from. */ 
> void
> -tuagg_sumup(struct tusage *tu, const struct tusage *from)
> +tuagg_sumup(struct tusage *tu, struct tusage *from)
> {
> 	struct tusage	tmp;
> -	uint64_t	enter, leave;
> +	unsigned int	gen;
> 
> -	enter = from->tu_gen;
> -	for (;;) {
> -		/* the generation number is odd during an update */
> -		while (enter & 1) {
> -			CPU_BUSY_CYCLE();
> -			enter = from->tu_gen;
> -		}
> -
> -		membar_consumer();
> +	pc_cons_enter(&from->tu_pcl, &gen);
> +	do {
> 		tmp = *from;
> -		membar_consumer();
> -		leave = from->tu_gen;
> -
> -		if (enter == leave)
> -			break;
> -		enter = leave;
> -	}
> +	} while (pc_cons_leave(&from->tu_pcl, &gen) != 0);
> 
> 	tu->tu_uticks += tmp.tu_uticks;
> 	tu->tu_sticks += tmp.tu_sticks;
> @@ -433,12 +420,14 @@ tuagg_get_process(struct tusage *tu, str
> void
> tuagg_add_process(struct process *pr, struct proc *p)
> {
> +	unsigned int gen;
> +
> 	MUTEX_ASSERT_LOCKED(&pr->ps_mtx);
> 	KASSERT(curproc == p || p->p_stat == SDEAD);
> 
> -	tu_enter(&pr->ps_tu);
> +	gen = tu_enter(&pr->ps_tu);
> 	tuagg_sumup(&pr->ps_tu, &p->p_tu);
> -	tu_leave(&pr->ps_tu);
> +	tu_leave(&pr->ps_tu, gen);
> 
> 	/* Now reset CPU time usage for the thread. */
> 	timespecclear(&p->p_tu.tu_runtime);
> @@ -452,6 +441,7 @@ tuagg_add_runtime(void)
> 	struct schedstate_percpu *spc = &curcpu()->ci_schedstate;
> 	struct proc *p = curproc;
> 	struct timespec ts, delta;
> +	unsigned int gen;
> 
> 	/*
> 	 * Compute the amount of time during which the current
> @@ -472,9 +462,9 @@ tuagg_add_runtime(void)
> 	}
> 	/* update spc_runtime */
> 	spc->spc_runtime = ts;
> -	tu_enter(&p->p_tu);
> +	gen = tu_enter(&p->p_tu);
> 	timespecadd(&p->p_tu.tu_runtime, &delta, &p->p_tu.tu_runtime);
> -	tu_leave(&p->p_tu);
> +	tu_leave(&p->p_tu, gen);
> }
> 
> /*
> Index: sys/kern/kern_sysctl.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
> diff -u -p -r1.465 kern_sysctl.c
> --- sys/kern/kern_sysctl.c	27 Apr 2025 00:58:55 -0000	1.465
> +++ sys/kern/kern_sysctl.c	4 May 2025 07:18:11 -0000
> @@ -172,6 +172,8 @@ int hw_sysctl_locked(int *, u_int, void 
> 
> int (*cpu_cpuspeed)(int *);
> 
> +static void sysctl_ci_cp_time(struct cpu_info *, uint64_t *);
> +
> /*
>  * Lock to avoid too many processes vslocking a large amount of memory
>  * at the same time.
> @@ -682,11 +684,15 @@ kern_sysctl_locked(int *name, u_int name
> 		memset(cp_time, 0, sizeof(cp_time));
> 
> 		CPU_INFO_FOREACH(cii, ci) {
> +			uint64_t ci_cp_time[CPUSTATES];
> +
> 			if (!cpu_is_online(ci))
> 				continue;
> +
> 			n++;
> +			sysctl_ci_cp_time(ci, ci_cp_time);
> 			for (i = 0; i < CPUSTATES; i++)
> -				cp_time[i] += ci->ci_schedstate.spc_cp_time[i];
> +				cp_time[i] += ci_cp_time[i];
> 		}
> 
> 		for (i = 0; i < CPUSTATES; i++)
> @@ -2793,12 +2799,27 @@ sysctl_sensors(int *name, u_int namelen,
> }
> #endif	/* SMALL_KERNEL */
> 
> +static void
> +sysctl_ci_cp_time(struct cpu_info *ci, uint64_t *cp_time)
> +{
> +	struct schedstate_percpu *spc = &ci->ci_schedstate;
> +	unsigned int gen;
> +
> +	pc_cons_enter(&spc->spc_cp_time_lock, &gen);
> +	do {
> +		int i;
> +		for (i = 0; i < CPUSTATES; i++)
> +			cp_time[i] = spc->spc_cp_time[i];
> +	} while (pc_cons_leave(&spc->spc_cp_time_lock, &gen) != 0);
> +}
> +
> int
> sysctl_cptime2(int *name, u_int namelen, void *oldp, size_t *oldlenp,
>     void *newp, size_t newlen)
> {
> 	CPU_INFO_ITERATOR cii;
> 	struct cpu_info *ci;
> +	uint64_t cp_time[CPUSTATES];
> 	int found = 0;
> 
> 	if (namelen != 1)
> @@ -2813,9 +2834,10 @@ sysctl_cptime2(int *name, u_int namelen,
> 	if (!found)
> 		return (ENOENT);
> 
> +	sysctl_ci_cp_time(ci, cp_time);
> +
> 	return (sysctl_rdstruct(oldp, oldlenp, newp,
> -	    &ci->ci_schedstate.spc_cp_time,
> -	    sizeof(ci->ci_schedstate.spc_cp_time)));
> +	    cp_time, sizeof(cp_time)));
> }
> 
> #if NAUDIO > 0
> @@ -2881,7 +2903,7 @@ sysctl_cpustats(int *name, u_int namelen
> 		return (ENOENT);
> 
> 	memset(&cs, 0, sizeof cs);
> -	memcpy(&cs.cs_time, &ci->ci_schedstate.spc_cp_time, sizeof(cs.cs_time));
> +	sysctl_ci_cp_time(ci, cs.cs_time);
> 	cs.cs_flags = 0;
> 	if (cpu_is_online(ci))
> 		cs.cs_flags |= CPUSTATS_ONLINE;
> Index: sys/kern/sched_bsd.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/sched_bsd.c,v
> diff -u -p -r1.99 sched_bsd.c
> --- sys/kern/sched_bsd.c	10 Mar 2025 09:28:56 -0000	1.99
> +++ sys/kern/sched_bsd.c	4 May 2025 07:18:11 -0000
> @@ -585,6 +585,7 @@ setperf_auto(void *v)
> 	CPU_INFO_ITERATOR cii;
> 	struct cpu_info *ci;
> 	uint64_t idle, total, allidle = 0, alltotal = 0;
> +	unsigned int gen;
> 
> 	if (!perfpolicy_dynamic())
> 		return;
> @@ -609,14 +610,23 @@ setperf_auto(void *v)
> 			return;
> 		}
> 	CPU_INFO_FOREACH(cii, ci) {
> +		struct schedstate_percpu *spc;
> +
> 		if (!cpu_is_online(ci))
> 			continue;
> -		total = 0;
> -		for (i = 0; i < CPUSTATES; i++) {
> -			total += ci->ci_schedstate.spc_cp_time[i];
> -		}
> +
> +		spc = &ci->ci_schedstate;
> +		pc_cons_enter(&spc->spc_cp_time_lock, &gen);
> +		do {
> +			total = 0;
> +			for (i = 0; i < CPUSTATES; i++) {
> +				total += spc->spc_cp_time[i];
> +			}
> +			idle = spc->spc_cp_time[CP_IDLE];
> +		} while (pc_cons_leave(&spc->spc_cp_time_lock, &gen) != 0);
> +
> 		total -= totalticks[j];
> -		idle = ci->ci_schedstate.spc_cp_time[CP_IDLE] - idleticks[j];
> +		idle -= idleticks[j];
> 		if (idle < total / 3)
> 			speedup = 1;
> 		alltotal += total;
> Index: sys/sys/pclock.h
> ===================================================================
> RCS file: sys/sys/pclock.h
> diff -N sys/sys/pclock.h
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ sys/sys/pclock.h	4 May 2025 07:18:11 -0000
> @@ -0,0 +1,49 @@
> +/*	$OpenBSD$ */
> +
> +/*
> + * Copyright (c) 2023 David Gwynne <dlg@openbsd.org>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#ifndef _SYS_PCLOCK_H
> +#define _SYS_PCLOCK_H
> +
> +#include <sys/_lock.h>
> +
> +struct pc_lock {
> +	volatile unsigned int	 pcl_gen;
> +};
> +
> +#ifdef _KERNEL
> +
> +#define PC_LOCK_INITIALIZER() { .pcl_gen = 0 }
> +
> +void		pc_lock_init(struct pc_lock *);
> +
> +/* single (non-interlocking) producer */
> +unsigned int	pc_sprod_enter(struct pc_lock *);
> +void		pc_sprod_leave(struct pc_lock *, unsigned int);
> +
> +/* multiple (interlocking) producers */
> +unsigned int	pc_mprod_enter(struct pc_lock *);
> +void		pc_mprod_leave(struct pc_lock *, unsigned int);
> +
> +/* consumer */
> +void		pc_cons_enter(struct pc_lock *, unsigned int *);
> +__warn_unused_result int
> +		pc_cons_leave(struct pc_lock *, unsigned int *);
> +
> +#endif /* _KERNEL */
> +
> +#endif /* _SYS_PCLOCK_H */
> Index: sys/sys/proc.h
> ===================================================================
> RCS file: /cvs/src/sys/sys/proc.h,v
> diff -u -p -r1.387 proc.h
> --- sys/sys/proc.h	2 May 2025 05:04:38 -0000	1.387
> +++ sys/sys/proc.h	4 May 2025 07:18:11 -0000
> @@ -51,6 +51,7 @@
> #include <sys/rwlock.h>			/* For struct rwlock */
> #include <sys/sigio.h>			/* For struct sigio */
> #include <sys/refcnt.h>			/* For struct refcnt */
> +#include <sys/pclock.h>
> 
> #ifdef _KERNEL
> #include <sys/atomic.h>
> @@ -91,8 +92,8 @@ struct	pgrp {
>  * Each thread is immediately accumulated here. For processes only the
>  * time of exited threads is accumulated and to get the proper process
>  * time usage tuagg_get_process() needs to be called.
> - * Accounting of threads is done lockless by curproc using the tu_gen
> - * generation counter. Code should use tu_enter() and tu_leave() for this.
> + * Accounting of threads is done lockless by curproc using the tu_pcl
> + * pc_lock. Code should use tu_enter() and tu_leave() for this.
>  * The process ps_tu structure is locked by the ps_mtx.
>  */
> #define TU_UTICKS	0		/* Statclock hits in user mode. */
> @@ -101,7 +102,7 @@ struct	pgrp {
> #define TU_TICKS_COUNT	3
> 
> struct tusage {
> -	uint64_t	tu_gen;		/* generation counter */
> +	struct	pc_lock	tu_pcl;
> 	uint64_t	tu_ticks[TU_TICKS_COUNT];
> #define tu_uticks	tu_ticks[TU_UTICKS]
> #define tu_sticks	tu_ticks[TU_STICKS]
> @@ -125,8 +126,6 @@ struct tusage {
>  * run-time information needed by threads.
>  */
> #ifdef __need_process
> -struct futex;
> -LIST_HEAD(futex_list, futex);
> struct proc;
> struct tslpentry;
> TAILQ_HEAD(tslpqueue, tslpentry);
> @@ -187,7 +186,6 @@ struct process {
> 	struct	vmspace *ps_vmspace;	/* Address space */
> 	pid_t	ps_pid;			/* [I] Process identifier. */
> 
> -	struct	futex_list ps_ftlist;	/* futexes attached to this process */
> 	struct	tslpqueue ps_tslpqueue;	/* [p] queue of threads in thrsleep */
> 	struct	rwlock	ps_lock;	/* per-process rwlock */
> 	struct  mutex	ps_mtx;		/* per-process mutex */
> @@ -353,9 +351,6 @@ struct proc {
> 	struct	process *p_p;		/* [I] The process of this thread. */
> 	TAILQ_ENTRY(proc) p_thr_link;	/* [K|m] Threads in a process linkage. */
> 
> -	TAILQ_ENTRY(proc) p_fut_link;	/* Threads in a futex linkage. */
> -	struct	futex	*p_futex;	/* Current sleeping futex. */
> -
> 	/* substructures: */
> 	struct	filedesc *p_fd;		/* copy of p_p->ps_fd */
> 	struct	vmspace *p_vmspace;	/* [I] copy of p_p->ps_vmspace */
> @@ -655,18 +650,16 @@ void cpuset_complement(struct cpuset *, 
> int cpuset_cardinality(struct cpuset *);
> struct cpu_info *cpuset_first(struct cpuset *);
> 
> -static inline void
> +static inline unsigned int
> tu_enter(struct tusage *tu)
> {
> -	++tu->tu_gen; /* make the generation number odd */
> -	membar_producer();
> +	return pc_sprod_enter(&tu->tu_pcl);
> }
> 
> static inline void
> -tu_leave(struct tusage *tu)
> +tu_leave(struct tusage *tu, unsigned int gen)
> {
> -	membar_producer();
> -	++tu->tu_gen; /* make the generation number even again */
> +	pc_sprod_leave(&tu->tu_pcl, gen);
> }
> 
> #endif	/* _KERNEL */
> Index: sys/sys/sched.h
> ===================================================================
> RCS file: /cvs/src/sys/sys/sched.h,v
> diff -u -p -r1.73 sched.h
> --- sys/sys/sched.h	8 Jul 2024 14:46:47 -0000	1.73
> +++ sys/sys/sched.h	4 May 2025 07:18:11 -0000
> @@ -97,6 +97,7 @@ struct cpustats {
> 
> #include <sys/clockintr.h>
> #include <sys/queue.h>
> +#include <sys/pclock.h>
> 
> #define	SCHED_NQS	32			/* 32 run queues. */
> 
> @@ -112,6 +113,7 @@ struct schedstate_percpu {
> 	struct timespec spc_runtime;	/* time curproc started running */
> 	volatile int spc_schedflags;	/* flags; see below */
> 	u_int spc_schedticks;		/* ticks for schedclock() */
> +	struct pc_lock spc_cp_time_lock;
> 	u_int64_t spc_cp_time[CPUSTATES]; /* CPU state statistics */
> 	u_char spc_curpriority;		/* usrpri of curproc */
> 
>