From: Claudio Jeker Subject: Re: kern_resorce: move to lazy updates after donice To: Tim Leslie Cc: "tech@openbsd.org" Date: Mon, 16 Feb 2026 16:17:43 +0100 On Sat, Feb 14, 2026 at 01:18:31PM +0000, Tim Leslie wrote: > > @tech, > > Diff below to remove the immediate setpriority update loop in kern_resource. > > Currently, donice grabs ps_mtx and SCHED_LOCK to call setpriority on all > threads. However, setpriority only updates p_usrpri; it does not move > SRUN threads to their new runqueue buckets. That re-bucketing only > happens in schedcpu (once per second). > > Since the scheduler doesn't act on the new priority until schedcpu > anyway, there is no latency benefit to updating p_usrpri immediately for > runnable threads. The good thing is that posix gives us a lot of rope when implementing this API e.g for setpriority: ... [PS|TPS] Any processes or threads using SCHED_FIFO or SCHED_RR shall be unaffected by a call to setpriority(). This is not considered an error. A process which subsequently reverts to SCHED_OTHER need not have its priority affected by such a setpriority() call. End] The effect of changing the nice value may vary depending on the process-scheduling algorithm in effect. And nice has: [PS|TPS] Calling the nice() function has no effect on the priority of processes or threads with policy SCHED_FIFO or SCHED_RR. The effect on processes or threads with other scheduling policies is implementation-defined. So it seems we can do whatever we like here. > The other states are also covered: > SSLEEP: Priority is recalculated in setrunnable upon wakeup. > SONPROC: Priority is updated in schedclock (hardclock). This is not quite correct. If the sleep or stop state is less than a second then setpriority() is not called on wakeup. Instead schedcpu() does that at some point (before or after the wakeup). For SONPROC the update does indeed happen in schedclock() but has no influence on what is run. This happens much later via roundrobin() and a call to preempt(). At that point the p_usrpri needs to be updated to trigger the correct decision. I think the priority handling in the scheduler is sloppy at best (if not actually buggy). I'm a bit worried that removing this code will trigger a problem later on when e.g. the scheduler is changed since people will forget to adjust the setpriority() call to properly adjust the run priority. > As such, doing less is an improvement. WRITE_ONCE added to avoid any > read tears. Uhm. WRITE_ONCE only ensures that the write is made once. It does not really help the readers. This part of the diff I would drop for now. If we want to go this way then all accesses to ps_nice need to happen using the same style. > diff --git a/sys/kern/kern_resource.c b/sys/kern/kern_resource.c > --- a/sys/kern/kern_resource.c > +++ b/sys/kern/kern_resource.c > @@ -47,6 +47,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -196,7 +197,6 @@ int > donice(struct proc *curp, struct process *chgpr, int n) > { > struct ucred *ucred = curp->p_ucred; > - struct proc *p; > > if (ucred->cr_uid != 0 && ucred->cr_ruid != 0 && > ucred->cr_uid != chgpr->ps_ucred->cr_uid && > @@ -209,14 +209,7 @@ donice(struct proc *curp, struct process *chgpr, int n) > n += NZERO; > if (n < chgpr->ps_nice && suser(curp)) > return (EACCES); > - chgpr->ps_nice = n; > - mtx_enter(&chgpr->ps_mtx); > - SCHED_LOCK(); > - TAILQ_FOREACH(p, &chgpr->ps_threads, p_thr_link) { > - setpriority(p, p->p_estcpu, n); > - } > - SCHED_UNLOCK(); > - mtx_leave(&chgpr->ps_mtx); > + WRITE_ONCE(chgpr->ps_nice, n); > return (0); > } > > -- :wq Claudio