From: Martin Pieuchot <mpi@grenadille.net>
Subject: Re: [EXT] Re: Kernel protection fault in fill_kproc()
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: tech@openbsd.org
Date: Sat, 13 Sep 2025 10:58:22 +0200

On 13/09/25(Sat) 01:59, Mateusz Guzik wrote:
> On Thu, Aug 14, 2025 at 10:59:16AM +0000, Martin Pieuchot wrote:
> 
> To quote some other parts of the thread:
> 
> > > > In my opinion the fill_kproc() code is wrong and it should not look at
> > > > pr->ps_vmspace if the PS_EXITING flag is set for the process.
> > > > 
> > > > exit1() sets PS_EXITING flag early on and after that point the vm can be
> > > > purged so the vm_resident_count() is probably wrong anyway.
> > 
> > I guess that is safe since exit1() still runs with the kernel lock
> > held.  Which means that the PS_EXITING flags can't be set while
> > fill_kproc() runs, since it holds the kernel lock.
> > 
> 
> 
> I agree with the above sentiment, and with this in mind I disagree with
> the patch.

Let me rephrase:  you disagree with the existing logic.  The diff I sent
follow this logic.

> Trying to catch all the ways fill_kproc() can race against process
> teardown is an avoidable exercise in misery and can be handled the same
> way it was in sysctl_file(): utilize a refcount which stalls process
> exit.

Maybe.  Please show a diff, I'll be delighted to review it.  That said
this is not trivial at all and my diff is a step in this direction.

> I do have some criticism of the way it was implemented over there though

Again, send a diff and maybe we might work together.  I'm well aware of
all you wrote, so what?

> On the exit side there is bunch of exiting-related flags being set,
> including PS_EXITING and there is:
> 	refcnt_finalize(&pr->ps_refcnt, "psdtor");
> 
> In sysctl_file():
> 	KERNEL_LOCK();
> 	[snip]
> 	if (pr->ps_flags & (PS_SYSTEM | PS_EMBRYO | PS_EXITING))
> 		continue;
> 
> 	refcnt_take(&pr->ps_refcnt);
> 	....
> 	refcnt_rele_wake(&pr->ps_refcnt);
> 
> I presume this works because proc exit also takes KERNEL_LOCK, but this
> is bound to cause trouble at some point as the lock is pushed down.

Then push it down and you might fix the issues.

> Instead this relationship should be expicitly spelled out in relevant
> helpers, so that if someone does work in the area and forgets to cover
> it with new locking, they get a nice splat.

Then go ahead and spell it out.

> pseudo-code wise, something like this:

That wouldn't work for other reasons.  Please use our time wisely and
start by writing, testing and explaining a diff.

> /*
>  * One day this will be a per-process lock instead of kernel lock
>  */
> void
> proc_drain_holders(struct proc *pr)
> {
> 	ASSERT_KERNEL_LOCKED();
> 	KASSERT(pr->ps_flags & PS_EXITING);
> 	refcnt_finalize(&pr->ps_refcnt, "psdtor");
> }
> 
> bool
> proc_tryhold(struct proc *pr)
> {
> 	ASSERT_KERNEL_LOCKED();
> 	if (pr->ps_flags & PS_EXITING)
> 		return false;
> 	refcnt_take(&pr->ps_refcnt);
> 	return true;
> }
> 
> bool
> proc_rele(struct proc *pr)
> {
> 	ASSERT_KERNEL_LOCKED();
> 	refcnt_rele_wake(&pr->ps_refcnt);
> }
> 
> Then sysctl_file() and fill_kproc() can do this, KERNEL_LOCK'ed or not:
> 
> 	if (!proc_tryhold(pr)) {
> 		/* already exiting, pretent it's not there */
> 		....
> 	}
> 
> 	/* found and secured against exit */
> 
> 	/* done with it */
> 	proc_rele(pr);
> 
> This in particular protects against the vmspace going down.