Download raw body.
producer/consumer locking
Like this! I wanted to have something like mtx_enter_read()
for a long time.
> On 4 May 2025, at 10:20, David Gwynne <david@gwynne.id.au> wrote:
>
> this provides coordination between things producing and consuming
> data when you don't want to block or delay the thing producing data,
> but it's ok for make the consumer do more work to compensate for that.
>
> the mechanism is a generalisation of the coordination used in the mp
> counters api and the some of the process accounting code. data updated
> by a producer is versioned, and the consumer reads the version on each
> side of the critical section to see if it's been updated. if the
> producer has updated the version, then the consumer has to retry.
>
> the diff includes the migration of the process accounting to the
> generalised api, and adds it to the cpu state counters on each cpu.
> it's now possible to get a consistent snapshot of the cpu counters, even
> if they were preempted by statclock.
>
> i also have a pf diff that uses these. it allows pf to maintain counters
> that userland can read without blocking the execution of pf. handy.
>
> the only thing im worried about is the use of the alias function
> attributes for !MULTIPROCESSOR kernels, but we use those in libc on all
> our archs/compilers and it seems to be fine.
>
> the manpage looks like this:
>
> PC_LOCK_INIT(9) Kernel Developer's Manual PC_LOCK_INIT(9)
>
> NAME
> pc_lock_init, pc_cons_enter, pc_cons_leave, pc_sprod_enter,
> pc_sprod_leave, pc_mprod_enter, pc_mprod_leave, PC_LOCK_INITIALIZER -
> producer/consumer locks
>
> SYNOPSIS
> #include <sys/pclock.h>
>
> void
> pc_lock_init(struct pc_lock *pcl);
>
> void
> pc_cons_enter(struct pc_lock *pcl, unsigned int *genp);
>
> int
> pc_cons_leave(struct pc_lock *pcl, unsigned int *genp);
>
> unsigned int
> pc_sprod_enter(struct pc_lock *pcl);
>
> void
> pc_sprod_leave(struct pc_lock *pcl, unsigned int gen);
>
> unsigned int
> pc_mprod_enter(struct pc_lock *pcl);
>
> void
> pc_mprod_leave(struct pc_lock *pcl, unsigned int gen);
>
> PC_LOCK_INITIALIZER();
>
> DESCRIPTION
> The producer/consumer lock functions provide mechanisms for a consumer to
> read data without blocking or delaying another CPU or an interrupt when
> it is updating or producing data. A variant of the producer locking
> functions provides mutual exclusion between concurrent producers.
>
> This is implemented by having producers version the protected data with a
> generation number. Consumers of the data compare the generation number
> at the start of the critical section to the generation number at the end,
> and must retry reading the data if the generation number has changed.
>
> The pc_lock_init() function is used to initialise the producer/consumer
> lock pointed to by pcl.
>
> A producer/consumer lock declaration may be initialised with the
> PC_LOCK_INITIALIZER() macro.
>
> Consumer API
> pc_cons_enter() reads the current generation number from pcl and stores
> it in the memory provided by the caller via genp.
>
> pc_cons_leave() compares the generation number in pcl with the value
> stored in genp by pc_cons_enter() at the start of the critical section,
> and returns whether the reads within the critical section need to be
> retried because the data has been updated by the producer.
>
> Single Producer API
> The single producer API is optimised for updating data from code
>
> pc_sprod_enter() marks the beginning of a single producer critical
> section for the pcl producer/consumer lock.
>
> pc_sprod_leave() marks the end of a single producer critical section for
> the pcl producer/consumer lock. The gen argument must be the value
> returned from the preceding pc_sprod_enter() call.
>
> Multiple Producer API
> The multiple producer API provides mutual exclusion between multiple CPUs
> entering the critical section concurrently. Unlike mtx_enter(9), the
> multiple producer does not prevent preemption by interrupts, it only
> provides mutual exclusion between CPUs. If protection from preemption is
> required, splraise(9) can be used to protect the producer critical
> section.
>
> pc_mprod_enter() marks the beginning of a single producer critical
> section for the pcl producer/consumer lock.
>
> pc_mprod_leave() marks the end of a single producer critical section for
> the pcl producer/consumer lock. The gen argument must be the value
> returned from the preceding pc_mprod_enter() call.
>
> On uniprocessor kernels the multiple producer API is aliased to the
> single producer API.
>
> CONTEXT
> pc_lock_init(), pc_cons_enter(), pc_cons_leave(), pc_sprod_enter(),
> pc_sprod_leave(), pc_mprod_enter(), pc_mprod_leave(), can be called
> during autoconf, from process context, or from interrupt context.
>
> pc_sprod_enter(), pc_sprod_leave(), pc_mprod_enter(), and
> pc_mprod_leave() may run concurrently with (ie, on another CPU to) or
> preempt (ie, run at a higher interrupt level) than pc_cons_enter() and
> pc_cons_leave().
>
> pc_sprod_enter(), pc_sprod_leave(), pc_mprod_enter(), and
> pc_mprod_leave() must not be preempted or interrupted by the producer or
> consumer API for the same lock.
>
> RETURN VALUES
> pc_cons_leave() returns 0 if the critical section did not overlap with an
> update from a producer, or non-zero if the critical section must be
> retried.
>
> EXAMPLES
> To produce or update data:
>
> struct pc_lock pc = PC_LOCK_INITIALIZER();
>
> void
> producer(void)
> {
> unsigned int gen;
>
> gen = pc_sprod_enter(&pc);
> /* update data */
> pc_sprod_leave(&pc, gen);
> }
>
> A consistent read of the data from a consumer:
>
> void
> consumer(void)
> {
> unsigned int gen;
>
> pc_cons_enter(&pc, &gen);
> do {
> /* read data */
> } while (pc_cons_leave(&pc, &gen) != 0);
> }
>
> SEE ALSO
> mutex(9), splraise(9)
>
> HISTORY
> The pc_lock_init functions first appeared in OpenBSD 7.8.
>
> AUTHORS
> The pc_lock_init functions were written by David Gwynne
> <dlg@openbsd.org>.
>
> CAVEATS
> Updates must be produced infrequently enough to allow time for consumers
> to be able to get a consistent read without looping too often.
>
> Because consuming the data may loop when retrying, care must be taken to
> avoid side effects from reading the data multiple times, eg, when
> accumulating values.
>
> ok?
>
> Index: share/man/man9/Makefile
> ===================================================================
> RCS file: /cvs/src/share/man/man9/Makefile,v
> diff -u -p -r1.310 Makefile
> --- share/man/man9/Makefile 24 Feb 2024 16:21:32 -0000 1.310
> +++ share/man/man9/Makefile 4 May 2025 07:18:11 -0000
> @@ -29,7 +29,8 @@ MAN= aml_evalnode.9 atomic_add_int.9 ato
> malloc.9 membar_sync.9 memcmp.9 mbuf.9 mbuf_tags.9 md5.9 mi_switch.9 \
> microtime.9 ml_init.9 mq_init.9 mutex.9 \
> namei.9 \
> - panic.9 pci_conf_read.9 pci_mapreg_map.9 pci_intr_map.9 physio.9 \
> + panic.9 pci_conf_read.9 pci_mapreg_map.9 pci_intr_map.9 \
> + pc_lock_init.9 physio.9 \
> pmap.9 pool.9 pool_cache_init.9 ppsratecheck.9 printf.9 psignal.9 \
> RBT_INIT.9 \
> radio.9 arc4random.9 rasops.9 ratecheck.9 refcnt_init.9 resettodr.9 \
> Index: share/man/man9/pc_lock_init.9
> ===================================================================
> RCS file: share/man/man9/pc_lock_init.9
> diff -N share/man/man9/pc_lock_init.9
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ share/man/man9/pc_lock_init.9 4 May 2025 07:18:11 -0000
> @@ -0,0 +1,212 @@
> +.\" $OpenBSD$
> +.\"
> +.\" Copyright (c) 2025 David Gwynne <dlg@openbsd.org>
> +.\" All rights reserved.
> +.\"
> +.\" Permission to use, copy, modify, and distribute this software for any
> +.\" purpose with or without fee is hereby granted, provided that the above
> +.\" copyright notice and this permission notice appear in all copies.
> +.\"
> +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> +.\"
> +.Dd $Mdocdate: November 4 2019 $
> +.Dt PC_LOCK_INIT 9
> +.Os
> +.Sh NAME
> +.Nm pc_lock_init ,
> +.Nm pc_cons_enter ,
> +.Nm pc_cons_leave ,
> +.Nm pc_sprod_enter ,
> +.Nm pc_sprod_leave ,
> +.Nm pc_mprod_enter ,
> +.Nm pc_mprod_leave ,
> +.Nm PC_LOCK_INITIALIZER
> +.Nd producer/consumer locks
> +.Sh SYNOPSIS
> +.In sys/pclock.h
> +.Ft void
> +.Fn pc_lock_init "struct pc_lock *pcl"
> +.Ft void
> +.Fn pc_cons_enter "struct pc_lock *pcl" "unsigned int *genp"
> +.Ft int
> +.Fn pc_cons_leave "struct pc_lock *pcl" "unsigned int *genp"
> +.Ft unsigned int
> +.Fn pc_sprod_enter "struct pc_lock *pcl"
> +.Ft void
> +.Fn pc_sprod_leave "struct pc_lock *pcl" "unsigned int gen"
> +.Ft unsigned int
> +.Fn pc_mprod_enter "struct pc_lock *pcl"
> +.Ft void
> +.Fn pc_mprod_leave "struct pc_lock *pcl" "unsigned int gen"
> +.Fn PC_LOCK_INITIALIZER
> +.Sh DESCRIPTION
> +The producer/consumer lock functions provide mechanisms for a
> +consumer to read data without blocking or delaying another CPU or
> +an interrupt when it is updating or producing data.
> +A variant of the producer locking functions provides mutual exclusion
> +between multiple producers.
> +.Pp
> +This is implemented by having producers version the protected data
> +with a generation number.
> +Consumers of the data compare the generation number at the start
> +of the critical section to the generation number at the end, and
> +must retry reading the data if the generation number has changed.
> +.Pp
> +The
> +.Fn pc_lock_init
> +function is used to initialise the producer/consumer lock pointed to by
> +.Fa pcl .
> +.Pp
> +A producer/consumer lock declaration may be initialised with the
> +.Fn PC_LOCK_INITIALIZER
> +macro.
> +.Ss Consumer API
> +.Fn pc_cons_enter
> +reads the current generation number from
> +.Fa pcl
> +and stores it in the memory provided by the caller via
> +.Fa genp .
> +.Pp
> +.Fn pc_cons_leave
> +compares the generation number in
> +.Fa pcl
> +with the value stored in
> +.Fa genp
> +by
> +.Fn pc_cons_enter
> +at the start of the critical section, and returns whether the reads
> +within the critical section need to be retried because the data has
> +been updated by the producer.
> +.Ss Single Producer API
> +The single producer API is optimised for updating data from code
> +.Pp
> +.Fn pc_sprod_enter
> +marks the beginning of a single producer critical section for the
> +.Fa pcl
> +producer/consumer lock.
> +.Pp
> +.Fn pc_sprod_leave
> +marks the end of a single producer critical section for the
> +.Fa pcl
> +producer/consumer lock.
> +The
> +.Fa gen
> +argument must be the value returned from the preceding
> +.Fn pc_sprod_enter
> +call.
> +.Ss Multiple Producer API
> +The multiple producer API provides mutual exclusion between multiple
> +CPUs entering the critical section concurrently.
> +Unlike
> +.Xr mtx_enter 9 ,
> +the multiple producer does not prevent preemption by interrupts,
> +it only provides mutual exclusion between CPUs.
> +If protection from preemption is required,
> +.Xr splraise 9
> +can be used to protect the producer critical section.
> +.Pp
> +.Fn pc_mprod_enter
> +marks the beginning of a single producer critical section for the
> +.Fa pcl
> +producer/consumer lock.
> +.Pp
> +.Fn pc_mprod_leave
> +marks the end of a single producer critical section for the
> +.Fa pcl
> +producer/consumer lock.
> +The
> +.Fa gen
> +argument must be the value returned from the preceding
> +.Fn pc_mprod_enter
> +call.
> +.Pp
> +On uniprocessor kernels the multiple producer API is aliased to the
> +single producer API.
> +.Sh CONTEXT
> +.Fn pc_lock_init ,
> +.Fn pc_cons_enter ,
> +.Fn pc_cons_leave ,
> +.Fn pc_sprod_enter ,
> +.Fn pc_sprod_leave ,
> +.Fn pc_mprod_enter ,
> +.Fn pc_mprod_leave ,
> +can be called during autoconf, from process context, or from interrupt context.
> +.Pp
> +.Fn pc_sprod_enter ,
> +.Fn pc_sprod_leave ,
> +.Fn pc_mprod_enter ,
> +and
> +.Fn pc_mprod_leave
> +may run concurrently with (ie, on another CPU to)
> +or preempt (ie, run at a higher interrupt level) than
> +.Fn pc_cons_enter
> +and
> +.Fn pc_cons_leave .
> +.Pp
> +.Fn pc_sprod_enter ,
> +.Fn pc_sprod_leave ,
> +.Fn pc_mprod_enter ,
> +and
> +.Fn pc_mprod_leave
> +must not be preempted or interrupted by the producer or consumer
> +API for the same lock.
> +.Sh RETURN VALUES
> +.Fn pc_cons_leave
> +returns 0 if the critical section did not overlap with an update
> +from a producer, or non-zero if the critical section must be retried.
> +.Sh EXAMPLES
> +To produce or update data:
> +.Bd -literal -offset indent
> +struct pc_lock pc = PC_LOCK_INITIALIZER();
> +
> +void
> +producer(void)
> +{
> + unsigned int gen;
> +
> + gen = pc_sprod_enter(&pc);
> + /* update data */
> + pc_sprod_leave(&pc, gen);
> +}
> +.Ed
> +.Pp
> +A consistent read of the data from a consumer:
> +.Bd -literal -offset indent
> +void
> +consumer(void)
> +{
> + unsigned int gen;
> +
> + pc_cons_enter(&pc, &gen);
> + do {
> + /* read data */
> + } while (pc_cons_leave(&pc, &gen) != 0);
> +}
> +.Ed
> +.Sh SEE ALSO
> +.Xr mutex 9 ,
> +.Xr splraise 9
> +.Sh HISTORY
> +The
> +.Nm
> +functions first appeared in
> +.Ox 7.8 .
> +.Sh AUTHORS
> +The
> +.Nm
> +functions were written by
> +.An David Gwynne Aq Mt dlg@openbsd.org .
> +.Sh CAVEATS
> +Updates must be produced infrequently enough to allow time for
> +consumers to be able to get a consistent read without looping too
> +often.
> +.Pp
> +Because consuming the data may loop when retrying, care must be
> +taken to avoid side effects from reading the data multiple times,
> +eg, when accumulating values.
> Index: sys/kern/kern_clock.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_clock.c,v
> diff -u -p -r1.125 kern_clock.c
> --- sys/kern/kern_clock.c 2 May 2025 05:04:38 -0000 1.125
> +++ sys/kern/kern_clock.c 4 May 2025 07:18:11 -0000
> @@ -270,6 +270,7 @@ statclock(struct clockrequest *cr, void
> struct process *pr;
> int tu_tick = -1;
> int cp_time;
> + unsigned int gen;
>
> if (statclock_is_randomized) {
> count = clockrequest_advance_random(cr, statclock_min,
> @@ -313,7 +314,9 @@ statclock(struct clockrequest *cr, void
> cp_time = CP_SPIN;
> }
>
> + gen = pc_sprod_enter(&spc->spc_cp_time_lock);
> spc->spc_cp_time[cp_time] += count;
> + pc_sprod_leave(&spc->spc_cp_time_lock, gen);
>
> if (p != NULL) {
> p->p_cpticks += count;
> @@ -322,7 +325,7 @@ statclock(struct clockrequest *cr, void
> struct vmspace *vm = p->p_vmspace;
> struct tusage *tu = &p->p_tu;
>
> - tu_enter(tu);
> + gen = tu_enter(tu);
> tu->tu_ticks[tu_tick] += count;
>
> /* maxrss is handled by uvm */
> @@ -334,7 +337,7 @@ statclock(struct clockrequest *cr, void
> tu->tu_isrss +=
> (vm->vm_ssize << (PAGE_SHIFT - 10)) * count;
> }
> - tu_leave(tu);
> + tu_leave(tu, gen);
> }
>
> /*
> Index: sys/kern/kern_exec.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_exec.c,v
> diff -u -p -r1.262 kern_exec.c
> --- sys/kern/kern_exec.c 17 Feb 2025 10:07:10 -0000 1.262
> +++ sys/kern/kern_exec.c 4 May 2025 07:18:11 -0000
> @@ -699,7 +699,7 @@ sys_execve(struct proc *p, void *v, regi
> /* reset CPU time usage for the thread, but not the process */
> timespecclear(&p->p_tu.tu_runtime);
> p->p_tu.tu_uticks = p->p_tu.tu_sticks = p->p_tu.tu_iticks = 0;
> - p->p_tu.tu_gen = 0;
> + pc_lock_init(&p->p_tu.tu_pcl);
>
> memset(p->p_name, 0, sizeof p->p_name);
>
> Index: sys/kern/kern_lock.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_lock.c,v
> diff -u -p -r1.75 kern_lock.c
> --- sys/kern/kern_lock.c 3 Jul 2024 01:36:50 -0000 1.75
> +++ sys/kern/kern_lock.c 4 May 2025 07:18:11 -0000
> @@ -24,6 +24,7 @@
> #include <sys/atomic.h>
> #include <sys/witness.h>
> #include <sys/mutex.h>
> +#include <sys/pclock.h>
>
> #include <ddb/db_output.h>
>
> @@ -418,3 +419,102 @@ _mtx_init_flags(struct mutex *m, int ipl
> _mtx_init(m, ipl);
> }
> #endif /* WITNESS */
> +
> +void
> +pc_lock_init(struct pc_lock *pcl)
> +{
> + pcl->pcl_gen = 0;
> +}
> +
> +unsigned int
> +pc_sprod_enter(struct pc_lock *pcl)
> +{
> + unsigned int gen;
> +
> + gen = pcl->pcl_gen;
> + pcl->pcl_gen = ++gen;
> + membar_producer();
> +
> + return (gen);
> +}
> +
> +void
> +pc_sprod_leave(struct pc_lock *pcl, unsigned int gen)
> +{
> + membar_producer();
> + pcl->pcl_gen = ++gen;
> +}
> +
> +#ifdef MULTIPROCESSOR
> +unsigned int
> +pc_mprod_enter(struct pc_lock *pcl)
> +{
> + unsigned int gen, ngen, ogen;
> +
> + gen = pcl->pcl_gen;
> + for (;;) {
> + while (gen & 1) {
> + CPU_BUSY_CYCLE();
> + gen = pcl->pcl_gen;
> + }
> +
> + ngen = 1 + gen;
> + ogen = atomic_cas_uint(&pcl->pcl_gen, gen, ngen);
> + if (gen == ogen)
> + break;
> +
> + CPU_BUSY_CYCLE();
> + gen = ogen;
> + }
> +
> + membar_enter_after_atomic();
> + return (ngen);
> +}
> +
> +void
> +pc_mprod_leave(struct pc_lock *pcl, unsigned int gen)
> +{
> + membar_exit();
> + pcl->pcl_gen = ++gen;
> +}
> +#else /* MULTIPROCESSOR */
> +unsigned int pc_mprod_enter(struct pc_lock *)
> + __attribute__((alias("pc_sprod_enter")));
> +void pc_mprod_leave(struct pc_lock *, unsigned int)
> + __attribute__((alias("pc_sprod_leave")));
> +#endif /* MULTIPROCESSOR */
> +
> +void
> +pc_cons_enter(struct pc_lock *pcl, unsigned int *genp)
> +{
> + unsigned int gen;
> +
> + gen = pcl->pcl_gen;
> + while (gen & 1) {
> + CPU_BUSY_CYCLE();
> + gen = pcl->pcl_gen;
> + }
> +
> + membar_consumer();
> + *genp = gen;
> +}
> +
> +int
> +pc_cons_leave(struct pc_lock *pcl, unsigned int *genp)
> +{
> + unsigned int gen;
> +
> + membar_consumer();
> +
> + gen = pcl->pcl_gen;
> + if (gen & 1) {
> + do {
> + CPU_BUSY_CYCLE();
> + gen = pcl->pcl_gen;
> + } while (gen & 1);
> + } else if (gen == *genp)
> + return (0);
> +
> + *genp = gen;
> + return (EBUSY);
> +}
> Index: sys/kern/kern_resource.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_resource.c,v
> diff -u -p -r1.94 kern_resource.c
> --- sys/kern/kern_resource.c 2 May 2025 05:04:38 -0000 1.94
> +++ sys/kern/kern_resource.c 4 May 2025 07:18:11 -0000
> @@ -63,7 +63,7 @@ struct plimit *lim_copy(struct plimit *)
> struct plimit *lim_write_begin(void);
> void lim_write_commit(struct plimit *);
>
> -void tuagg_sumup(struct tusage *, const struct tusage *);
> +void tuagg_sumup(struct tusage *, struct tusage *);
>
> /*
> * Patchable maximum data and stack limits.
> @@ -369,28 +369,15 @@ sys_getrlimit(struct proc *p, void *v, r
>
> /* Add the counts from *from to *tu, ensuring a consistent read of *from. */
> void
> -tuagg_sumup(struct tusage *tu, const struct tusage *from)
> +tuagg_sumup(struct tusage *tu, struct tusage *from)
> {
> struct tusage tmp;
> - uint64_t enter, leave;
> + unsigned int gen;
>
> - enter = from->tu_gen;
> - for (;;) {
> - /* the generation number is odd during an update */
> - while (enter & 1) {
> - CPU_BUSY_CYCLE();
> - enter = from->tu_gen;
> - }
> -
> - membar_consumer();
> + pc_cons_enter(&from->tu_pcl, &gen);
> + do {
> tmp = *from;
> - membar_consumer();
> - leave = from->tu_gen;
> -
> - if (enter == leave)
> - break;
> - enter = leave;
> - }
> + } while (pc_cons_leave(&from->tu_pcl, &gen) != 0);
>
> tu->tu_uticks += tmp.tu_uticks;
> tu->tu_sticks += tmp.tu_sticks;
> @@ -433,12 +420,14 @@ tuagg_get_process(struct tusage *tu, str
> void
> tuagg_add_process(struct process *pr, struct proc *p)
> {
> + unsigned int gen;
> +
> MUTEX_ASSERT_LOCKED(&pr->ps_mtx);
> KASSERT(curproc == p || p->p_stat == SDEAD);
>
> - tu_enter(&pr->ps_tu);
> + gen = tu_enter(&pr->ps_tu);
> tuagg_sumup(&pr->ps_tu, &p->p_tu);
> - tu_leave(&pr->ps_tu);
> + tu_leave(&pr->ps_tu, gen);
>
> /* Now reset CPU time usage for the thread. */
> timespecclear(&p->p_tu.tu_runtime);
> @@ -452,6 +441,7 @@ tuagg_add_runtime(void)
> struct schedstate_percpu *spc = &curcpu()->ci_schedstate;
> struct proc *p = curproc;
> struct timespec ts, delta;
> + unsigned int gen;
>
> /*
> * Compute the amount of time during which the current
> @@ -472,9 +462,9 @@ tuagg_add_runtime(void)
> }
> /* update spc_runtime */
> spc->spc_runtime = ts;
> - tu_enter(&p->p_tu);
> + gen = tu_enter(&p->p_tu);
> timespecadd(&p->p_tu.tu_runtime, &delta, &p->p_tu.tu_runtime);
> - tu_leave(&p->p_tu);
> + tu_leave(&p->p_tu, gen);
> }
>
> /*
> Index: sys/kern/kern_sysctl.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
> diff -u -p -r1.465 kern_sysctl.c
> --- sys/kern/kern_sysctl.c 27 Apr 2025 00:58:55 -0000 1.465
> +++ sys/kern/kern_sysctl.c 4 May 2025 07:18:11 -0000
> @@ -172,6 +172,8 @@ int hw_sysctl_locked(int *, u_int, void
>
> int (*cpu_cpuspeed)(int *);
>
> +static void sysctl_ci_cp_time(struct cpu_info *, uint64_t *);
> +
> /*
> * Lock to avoid too many processes vslocking a large amount of memory
> * at the same time.
> @@ -682,11 +684,15 @@ kern_sysctl_locked(int *name, u_int name
> memset(cp_time, 0, sizeof(cp_time));
>
> CPU_INFO_FOREACH(cii, ci) {
> + uint64_t ci_cp_time[CPUSTATES];
> +
> if (!cpu_is_online(ci))
> continue;
> +
> n++;
> + sysctl_ci_cp_time(ci, ci_cp_time);
> for (i = 0; i < CPUSTATES; i++)
> - cp_time[i] += ci->ci_schedstate.spc_cp_time[i];
> + cp_time[i] += ci_cp_time[i];
> }
>
> for (i = 0; i < CPUSTATES; i++)
> @@ -2793,12 +2799,27 @@ sysctl_sensors(int *name, u_int namelen,
> }
> #endif /* SMALL_KERNEL */
>
> +static void
> +sysctl_ci_cp_time(struct cpu_info *ci, uint64_t *cp_time)
> +{
> + struct schedstate_percpu *spc = &ci->ci_schedstate;
> + unsigned int gen;
> +
> + pc_cons_enter(&spc->spc_cp_time_lock, &gen);
> + do {
> + int i;
> + for (i = 0; i < CPUSTATES; i++)
> + cp_time[i] = spc->spc_cp_time[i];
> + } while (pc_cons_leave(&spc->spc_cp_time_lock, &gen) != 0);
> +}
> +
> int
> sysctl_cptime2(int *name, u_int namelen, void *oldp, size_t *oldlenp,
> void *newp, size_t newlen)
> {
> CPU_INFO_ITERATOR cii;
> struct cpu_info *ci;
> + uint64_t cp_time[CPUSTATES];
> int found = 0;
>
> if (namelen != 1)
> @@ -2813,9 +2834,10 @@ sysctl_cptime2(int *name, u_int namelen,
> if (!found)
> return (ENOENT);
>
> + sysctl_ci_cp_time(ci, cp_time);
> +
> return (sysctl_rdstruct(oldp, oldlenp, newp,
> - &ci->ci_schedstate.spc_cp_time,
> - sizeof(ci->ci_schedstate.spc_cp_time)));
> + cp_time, sizeof(cp_time)));
> }
>
> #if NAUDIO > 0
> @@ -2881,7 +2903,7 @@ sysctl_cpustats(int *name, u_int namelen
> return (ENOENT);
>
> memset(&cs, 0, sizeof cs);
> - memcpy(&cs.cs_time, &ci->ci_schedstate.spc_cp_time, sizeof(cs.cs_time));
> + sysctl_ci_cp_time(ci, cs.cs_time);
> cs.cs_flags = 0;
> if (cpu_is_online(ci))
> cs.cs_flags |= CPUSTATS_ONLINE;
> Index: sys/kern/sched_bsd.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/sched_bsd.c,v
> diff -u -p -r1.99 sched_bsd.c
> --- sys/kern/sched_bsd.c 10 Mar 2025 09:28:56 -0000 1.99
> +++ sys/kern/sched_bsd.c 4 May 2025 07:18:11 -0000
> @@ -585,6 +585,7 @@ setperf_auto(void *v)
> CPU_INFO_ITERATOR cii;
> struct cpu_info *ci;
> uint64_t idle, total, allidle = 0, alltotal = 0;
> + unsigned int gen;
>
> if (!perfpolicy_dynamic())
> return;
> @@ -609,14 +610,23 @@ setperf_auto(void *v)
> return;
> }
> CPU_INFO_FOREACH(cii, ci) {
> + struct schedstate_percpu *spc;
> +
> if (!cpu_is_online(ci))
> continue;
> - total = 0;
> - for (i = 0; i < CPUSTATES; i++) {
> - total += ci->ci_schedstate.spc_cp_time[i];
> - }
> +
> + spc = &ci->ci_schedstate;
> + pc_cons_enter(&spc->spc_cp_time_lock, &gen);
> + do {
> + total = 0;
> + for (i = 0; i < CPUSTATES; i++) {
> + total += spc->spc_cp_time[i];
> + }
> + idle = spc->spc_cp_time[CP_IDLE];
> + } while (pc_cons_leave(&spc->spc_cp_time_lock, &gen) != 0);
> +
> total -= totalticks[j];
> - idle = ci->ci_schedstate.spc_cp_time[CP_IDLE] - idleticks[j];
> + idle -= idleticks[j];
> if (idle < total / 3)
> speedup = 1;
> alltotal += total;
> Index: sys/sys/pclock.h
> ===================================================================
> RCS file: sys/sys/pclock.h
> diff -N sys/sys/pclock.h
> --- /dev/null 1 Jan 1970 00:00:00 -0000
> +++ sys/sys/pclock.h 4 May 2025 07:18:11 -0000
> @@ -0,0 +1,49 @@
> +/* $OpenBSD$ */
> +
> +/*
> + * Copyright (c) 2023 David Gwynne <dlg@openbsd.org>
> + *
> + * Permission to use, copy, modify, and distribute this software for any
> + * purpose with or without fee is hereby granted, provided that the above
> + * copyright notice and this permission notice appear in all copies.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> + */
> +
> +#ifndef _SYS_PCLOCK_H
> +#define _SYS_PCLOCK_H
> +
> +#include <sys/_lock.h>
> +
> +struct pc_lock {
> + volatile unsigned int pcl_gen;
> +};
> +
> +#ifdef _KERNEL
> +
> +#define PC_LOCK_INITIALIZER() { .pcl_gen = 0 }
> +
> +void pc_lock_init(struct pc_lock *);
> +
> +/* single (non-interlocking) producer */
> +unsigned int pc_sprod_enter(struct pc_lock *);
> +void pc_sprod_leave(struct pc_lock *, unsigned int);
> +
> +/* multiple (interlocking) producers */
> +unsigned int pc_mprod_enter(struct pc_lock *);
> +void pc_mprod_leave(struct pc_lock *, unsigned int);
> +
> +/* consumer */
> +void pc_cons_enter(struct pc_lock *, unsigned int *);
> +__warn_unused_result int
> + pc_cons_leave(struct pc_lock *, unsigned int *);
> +
> +#endif /* _KERNEL */
> +
> +#endif /* _SYS_PCLOCK_H */
> Index: sys/sys/proc.h
> ===================================================================
> RCS file: /cvs/src/sys/sys/proc.h,v
> diff -u -p -r1.387 proc.h
> --- sys/sys/proc.h 2 May 2025 05:04:38 -0000 1.387
> +++ sys/sys/proc.h 4 May 2025 07:18:11 -0000
> @@ -51,6 +51,7 @@
> #include <sys/rwlock.h> /* For struct rwlock */
> #include <sys/sigio.h> /* For struct sigio */
> #include <sys/refcnt.h> /* For struct refcnt */
> +#include <sys/pclock.h>
>
> #ifdef _KERNEL
> #include <sys/atomic.h>
> @@ -91,8 +92,8 @@ struct pgrp {
> * Each thread is immediately accumulated here. For processes only the
> * time of exited threads is accumulated and to get the proper process
> * time usage tuagg_get_process() needs to be called.
> - * Accounting of threads is done lockless by curproc using the tu_gen
> - * generation counter. Code should use tu_enter() and tu_leave() for this.
> + * Accounting of threads is done lockless by curproc using the tu_pcl
> + * pc_lock. Code should use tu_enter() and tu_leave() for this.
> * The process ps_tu structure is locked by the ps_mtx.
> */
> #define TU_UTICKS 0 /* Statclock hits in user mode. */
> @@ -101,7 +102,7 @@ struct pgrp {
> #define TU_TICKS_COUNT 3
>
> struct tusage {
> - uint64_t tu_gen; /* generation counter */
> + struct pc_lock tu_pcl;
> uint64_t tu_ticks[TU_TICKS_COUNT];
> #define tu_uticks tu_ticks[TU_UTICKS]
> #define tu_sticks tu_ticks[TU_STICKS]
> @@ -125,8 +126,6 @@ struct tusage {
> * run-time information needed by threads.
> */
> #ifdef __need_process
> -struct futex;
> -LIST_HEAD(futex_list, futex);
> struct proc;
> struct tslpentry;
> TAILQ_HEAD(tslpqueue, tslpentry);
> @@ -187,7 +186,6 @@ struct process {
> struct vmspace *ps_vmspace; /* Address space */
> pid_t ps_pid; /* [I] Process identifier. */
>
> - struct futex_list ps_ftlist; /* futexes attached to this process */
> struct tslpqueue ps_tslpqueue; /* [p] queue of threads in thrsleep */
> struct rwlock ps_lock; /* per-process rwlock */
> struct mutex ps_mtx; /* per-process mutex */
> @@ -353,9 +351,6 @@ struct proc {
> struct process *p_p; /* [I] The process of this thread. */
> TAILQ_ENTRY(proc) p_thr_link; /* [K|m] Threads in a process linkage. */
>
> - TAILQ_ENTRY(proc) p_fut_link; /* Threads in a futex linkage. */
> - struct futex *p_futex; /* Current sleeping futex. */
> -
> /* substructures: */
> struct filedesc *p_fd; /* copy of p_p->ps_fd */
> struct vmspace *p_vmspace; /* [I] copy of p_p->ps_vmspace */
> @@ -655,18 +650,16 @@ void cpuset_complement(struct cpuset *,
> int cpuset_cardinality(struct cpuset *);
> struct cpu_info *cpuset_first(struct cpuset *);
>
> -static inline void
> +static inline unsigned int
> tu_enter(struct tusage *tu)
> {
> - ++tu->tu_gen; /* make the generation number odd */
> - membar_producer();
> + return pc_sprod_enter(&tu->tu_pcl);
> }
>
> static inline void
> -tu_leave(struct tusage *tu)
> +tu_leave(struct tusage *tu, unsigned int gen)
> {
> - membar_producer();
> - ++tu->tu_gen; /* make the generation number even again */
> + pc_sprod_leave(&tu->tu_pcl, gen);
> }
>
> #endif /* _KERNEL */
> Index: sys/sys/sched.h
> ===================================================================
> RCS file: /cvs/src/sys/sys/sched.h,v
> diff -u -p -r1.73 sched.h
> --- sys/sys/sched.h 8 Jul 2024 14:46:47 -0000 1.73
> +++ sys/sys/sched.h 4 May 2025 07:18:11 -0000
> @@ -97,6 +97,7 @@ struct cpustats {
>
> #include <sys/clockintr.h>
> #include <sys/queue.h>
> +#include <sys/pclock.h>
>
> #define SCHED_NQS 32 /* 32 run queues. */
>
> @@ -112,6 +113,7 @@ struct schedstate_percpu {
> struct timespec spc_runtime; /* time curproc started running */
> volatile int spc_schedflags; /* flags; see below */
> u_int spc_schedticks; /* ticks for schedclock() */
> + struct pc_lock spc_cp_time_lock;
> u_int64_t spc_cp_time[CPUSTATES]; /* CPU state statistics */
> u_char spc_curpriority; /* usrpri of curproc */
>
>
producer/consumer locking