Download raw body.
Kernel protection fault in fill_kproc()
On Tue, Aug 12, 2025 at 11:49:12AM +0200, Mark Kettenis wrote:
> > Date: Tue, 12 Aug 2025 11:56:10 +0300
> > From: Vitaliy Makkoveev <mvs@openbsd.org>
> >
> > On Tue, Aug 12, 2025 at 07:22:29AM +0200, Claudio Jeker wrote:
> > > On Mon, Aug 11, 2025 at 10:45:05AM +0000, Gerhard Roth wrote:
> > > > About a year ago, the call to uvm_exit() was moved outside of theĀ
> > > > KERNEL_LOCK() in the reaper() by mpi@. Now we observed a kernel
> > > > protection fault that results from this change.
> > > >
> > > > In fill_kproc() we read the vmspace pointer (vm) right at the very
> > > > beginning of the function:
> > > >
> > > > struct vmspace *vm = pr->ps_vmspace;
> > > >
> > > > Sometime later, we try to access it:
> > > >
> > > > /* fixups that can only be done in the kernel */
> > > > if ((pr->ps_flags & PS_ZOMBIE) == 0) {
> > > > if ((pr->ps_flags & PS_EMBRYO) == 0 && vm != NULL)
> > > > ki->p_vm_rssize = vm_resident_count(vm);
> > > >
> > > >
> > > > In the meantime the process might have exited and the reaper() can free
> > > > the vmspace by calling uvm_exit(). After that, the 'vm' pointer in
> > > > fill_kproc() points to stale memory. Accessing it will yield a kernel
> > > > protection fault.
> > > >
> > > > BTW: only after freeing the vmspace of the process, the PS_ZOMBIE flag
> > > > is set by the reaper().
> > > >
> > > > I propose to put the reaper()'s call to uvm_exit() back under the
> > > > kernel lock to avoid the fault.
> > >
> > > In my opinion the fill_kproc() code is wrong and it should not look at
> > > pr->ps_vmspace if the PS_EXITING flag is set for the process.
> > >
> > > exit1() sets PS_EXITING flag early on and after that point the vm can be
> > > purged so the vm_resident_count() is probably wrong anyway.
>
> I guess that is safe since exit1() still runs with the kernel lock
> held. Which means that the PS_EXITING flags can't be set while
> fill_kproc() runs, since it holds the kernel lock.
>
> > The only fill_kproc() is the sysctl_doproc() which does the check in the
> > beginning of the allprocess loop:
> >
> > for (; pr != NULL; pr = LIST_NEXT(pr, ps_list)) {
> > /* XXX skip processes in the middle of being zapped */
> > if (pr->ps_pgrp == NULL)
> > continue;
>
> There is some other code where an "external observer" looks at
> ps_vmspace in kern/sys_process.c:process_domem():
>
> vm = tr->ps_vmspace;
> if ((tr->ps_flags & PS_EXITING) || (vm->vm_refcnt < 1))
> return EFAULT;
> addr = uio->uio_offset;
>
> uvmspace_addref(vm);
>
> error = uvm_io(&vm->vm_map, uio, UVM_IO_FIXPROT);
>
> uvmspace_free(vm);
>
> So that checks PS_EXITING as well, but also checks the refcnt.
>
> As you can see this also takes a reference of the vmspace. I guess
> that's necessary since uvm_io() may sleep.
The problem lies in the unlocked uvm_exit(pr) which starts teardown of
vmspace. Simple adding uvmspace_addref() somewhere else will not help
you because there is no guarantees that your uvmspace_addref() is the
winner. So you need to serialize the uvm_exit() and the
uvmspace_addref() thread. This means it should be moved back under
kernel lock.
In my initial diff I propose to move uvm_exit(pr) after the kernel
locked section of reaper(). This mead the vmspace teardown will start
after the process being unlinek from the allprocess or zombprocess lists
and not accessed by sysctl(2).
Kernel protection fault in fill_kproc()