Download raw body.
Kernel protection fault in fill_kproc()
On Tue, Aug 12, 2025 at 12:40:21PM +0200, Martin Pieuchot wrote:
> On 12/08/25(Tue) 13:00, Vitaliy Makkoveev wrote:
> > On Tue, Aug 12, 2025 at 11:49:12AM +0200, Mark Kettenis wrote:
> > > > Date: Tue, 12 Aug 2025 11:56:10 +0300
> > > > From: Vitaliy Makkoveev <mvs@openbsd.org>
> > > >
> > > > On Tue, Aug 12, 2025 at 07:22:29AM +0200, Claudio Jeker wrote:
> > > > > On Mon, Aug 11, 2025 at 10:45:05AM +0000, Gerhard Roth wrote:
> > > > > > About a year ago, the call to uvm_exit() was moved outside of theĀ
> > > > > > KERNEL_LOCK() in the reaper() by mpi@. Now we observed a kernel
> > > > > > protection fault that results from this change.
> > > > > >
> > > > > > In fill_kproc() we read the vmspace pointer (vm) right at the very
> > > > > > beginning of the function:
> > > > > >
> > > > > > struct vmspace *vm = pr->ps_vmspace;
> > > > > >
> > > > > > Sometime later, we try to access it:
> > > > > >
> > > > > > /* fixups that can only be done in the kernel */
> > > > > > if ((pr->ps_flags & PS_ZOMBIE) == 0) {
> > > > > > if ((pr->ps_flags & PS_EMBRYO) == 0 && vm != NULL)
> > > > > > ki->p_vm_rssize = vm_resident_count(vm);
> > > > > >
> > > > > >
> > > > > > In the meantime the process might have exited and the reaper() can free
> > > > > > the vmspace by calling uvm_exit(). After that, the 'vm' pointer in
> > > > > > fill_kproc() points to stale memory. Accessing it will yield a kernel
> > > > > > protection fault.
> > > > > >
> > > > > > BTW: only after freeing the vmspace of the process, the PS_ZOMBIE flag
> > > > > > is set by the reaper().
> > > > > >
> > > > > > I propose to put the reaper()'s call to uvm_exit() back under the
> > > > > > kernel lock to avoid the fault.
> > > > >
> > > > > In my opinion the fill_kproc() code is wrong and it should not look at
> > > > > pr->ps_vmspace if the PS_EXITING flag is set for the process.
> > > > >
> > > > > exit1() sets PS_EXITING flag early on and after that point the vm can be
> > > > > purged so the vm_resident_count() is probably wrong anyway.
> > >
> > > I guess that is safe since exit1() still runs with the kernel lock
> > > held. Which means that the PS_EXITING flags can't be set while
> > > fill_kproc() runs, since it holds the kernel lock.
> > >
> > > > The only fill_kproc() is the sysctl_doproc() which does the check in the
> > > > beginning of the allprocess loop:
> > > >
> > > > for (; pr != NULL; pr = LIST_NEXT(pr, ps_list)) {
> > > > /* XXX skip processes in the middle of being zapped */
> > > > if (pr->ps_pgrp == NULL)
> > > > continue;
> > >
> > > There is some other code where an "external observer" looks at
> > > ps_vmspace in kern/sys_process.c:process_domem():
> > >
> > > vm = tr->ps_vmspace;
> > > if ((tr->ps_flags & PS_EXITING) || (vm->vm_refcnt < 1))
> > > return EFAULT;
> > > addr = uio->uio_offset;
> > >
> > > uvmspace_addref(vm);
> > >
> > > error = uvm_io(&vm->vm_map, uio, UVM_IO_FIXPROT);
> > >
> > > uvmspace_free(vm);
> > >
> > > So that checks PS_EXITING as well, but also checks the refcnt.
> > >
> > > As you can see this also takes a reference of the vmspace. I guess
> > > that's necessary since uvm_io() may sleep.
> >
> > The problem lies in the unlocked uvm_exit(pr) which starts teardown of
> > vmspace. Simple adding uvmspace_addref() somewhere else will not help
> > you because there is no guarantees that your uvmspace_addref() is the
> > winner. So you need to serialize the uvm_exit() and the
> > uvmspace_addref() thread. This means it should be moved back under
> > kernel lock.
>
> Indeed, clearing `ps_vmspace' isn't safe without KERNEL_LOCK(). The
> drawback of that approach is that pmap_destroy(), which is costly, will
> no longer be executed in parallel.
>
> > In my initial diff I propose to move uvm_exit(pr) after the kernel
> > locked section of reaper(). This mead the vmspace teardown will start
> > after the process being unlinek from the allprocess or zombprocess lists
> > and not accessed by sysctl(2).
>
> I'd suggest skipping dereferencing `ps_vmspace' for now to be coherent
> with what is done by sysctl_proc_args().
>
> Accessing zombie process descriptors out of the KERNEL_LOCK() would be a
> must for killing the reaper. Anyone interested?
>
> Diff below is untested, I'm too busy atm.
>
You do lockless uvm_exit(pr) before setting PS_ZOMBIE flag. This is the
case I described below, no guarantees that your uvmspace_addref(vm)
thread is the winner.
Putting the uvmspace_free(vm) just after the kernel locked section in
reaper makes it safe, because at this point the process is not accessible
from the alloprocess or zombprocess loops. Also this fixes the
sysctl_proc_args(), which has the same issue with unlocked uvm_exit(pr)
before setting PS_ZOMBIE flag.
> Index: kern/kern_sysctl.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
> diff -u -p -r1.482 kern_sysctl.c
> --- kern/kern_sysctl.c 6 Aug 2025 14:00:33 -0000 1.482
> +++ kern/kern_sysctl.c 12 Aug 2025 10:35:07 -0000
> @@ -2051,11 +2051,17 @@ fill_kproc(struct process *pr, struct ki
> {
> struct session *s = pr->ps_session;
> struct tty *tp;
> - struct vmspace *vm = pr->ps_vmspace;
> + struct vmspace *vm = NULL;
> struct timespec booted, st, ut, utc;
> struct tusage tu;
> int isthread;
>
> + /* exiting/zombie process might no longer have VM space. */
> + if ((pr->ps_flags & (PS_ZOMBIE|PS_EXITING)) == 0) {
> + vm = pr->ps_vmspace;
> + uvmspace_addref(vm);
> + }
> +
> isthread = p != NULL;
> if (!isthread) {
> p = pr->ps_mainproc; /* XXX */
> @@ -2102,6 +2108,8 @@ fill_kproc(struct process *pr, struct ki
> ki->p_cpuid = CPU_INFO_UNIT(p->p_cpu);
> #endif
> }
> +
> + uvmspace_free(vm);
>
> /* get %cpu and schedule state: just one thread or sum of all? */
> if (isthread) {
>
>
Kernel protection fault in fill_kproc()