Download raw body.
Kernel protection fault in fill_kproc()
On Wed, Aug 13, 2025 at 02:35:39PM +0200, Martin Pieuchot wrote:
> On 12/08/25(Tue) 14:36, Vitaliy Makkoveev wrote:
> > On Tue, Aug 12, 2025 at 12:40:21PM +0200, Martin Pieuchot wrote:
> > > On 12/08/25(Tue) 13:00, Vitaliy Makkoveev wrote:
> > > > On Tue, Aug 12, 2025 at 11:49:12AM +0200, Mark Kettenis wrote:
> > > > > > Date: Tue, 12 Aug 2025 11:56:10 +0300
> > > > > > From: Vitaliy Makkoveev <mvs@openbsd.org>
> > > > > >
> > > > > > On Tue, Aug 12, 2025 at 07:22:29AM +0200, Claudio Jeker wrote:
> > > > > > > On Mon, Aug 11, 2025 at 10:45:05AM +0000, Gerhard Roth wrote:
> > > > > > > > About a year ago, the call to uvm_exit() was moved outside of theĀ
> > > > > > > > KERNEL_LOCK() in the reaper() by mpi@. Now we observed a kernel
> > > > > > > > protection fault that results from this change.
> > > > > > > >
> > > > > > > > In fill_kproc() we read the vmspace pointer (vm) right at the very
> > > > > > > > beginning of the function:
> > > > > > > >
> > > > > > > > struct vmspace *vm = pr->ps_vmspace;
> > > > > > > >
> > > > > > > > Sometime later, we try to access it:
> > > > > > > >
> > > > > > > > /* fixups that can only be done in the kernel */
> > > > > > > > if ((pr->ps_flags & PS_ZOMBIE) == 0) {
> > > > > > > > if ((pr->ps_flags & PS_EMBRYO) == 0 && vm != NULL)
> > > > > > > > ki->p_vm_rssize = vm_resident_count(vm);
> > > > > > > >
> > > > > > > >
> > > > > > > > In the meantime the process might have exited and the reaper() can free
> > > > > > > > the vmspace by calling uvm_exit(). After that, the 'vm' pointer in
> > > > > > > > fill_kproc() points to stale memory. Accessing it will yield a kernel
> > > > > > > > protection fault.
> > > > > > > >
> > > > > > > > BTW: only after freeing the vmspace of the process, the PS_ZOMBIE flag
> > > > > > > > is set by the reaper().
> > > > > > > >
> > > > > > > > I propose to put the reaper()'s call to uvm_exit() back under the
> > > > > > > > kernel lock to avoid the fault.
> > > > > > >
> > > > > > > In my opinion the fill_kproc() code is wrong and it should not look at
> > > > > > > pr->ps_vmspace if the PS_EXITING flag is set for the process.
> > > > > > >
> > > > > > > exit1() sets PS_EXITING flag early on and after that point the vm can be
> > > > > > > purged so the vm_resident_count() is probably wrong anyway.
> > > > >
> > > > > I guess that is safe since exit1() still runs with the kernel lock
> > > > > held. Which means that the PS_EXITING flags can't be set while
> > > > > fill_kproc() runs, since it holds the kernel lock.
> > > > >
> > > > > > The only fill_kproc() is the sysctl_doproc() which does the check in the
> > > > > > beginning of the allprocess loop:
> > > > > >
> > > > > > for (; pr != NULL; pr = LIST_NEXT(pr, ps_list)) {
> > > > > > /* XXX skip processes in the middle of being zapped */
> > > > > > if (pr->ps_pgrp == NULL)
> > > > > > continue;
> > > > >
> > > > > There is some other code where an "external observer" looks at
> > > > > ps_vmspace in kern/sys_process.c:process_domem():
> > > > >
> > > > > vm = tr->ps_vmspace;
> > > > > if ((tr->ps_flags & PS_EXITING) || (vm->vm_refcnt < 1))
> > > > > return EFAULT;
> > > > > addr = uio->uio_offset;
> > > > >
> > > > > uvmspace_addref(vm);
> > > > >
> > > > > error = uvm_io(&vm->vm_map, uio, UVM_IO_FIXPROT);
> > > > >
> > > > > uvmspace_free(vm);
> > > > >
> > > > > So that checks PS_EXITING as well, but also checks the refcnt.
> > > > >
> > > > > As you can see this also takes a reference of the vmspace. I guess
> > > > > that's necessary since uvm_io() may sleep.
> > > >
> > > > The problem lies in the unlocked uvm_exit(pr) which starts teardown of
> > > > vmspace. Simple adding uvmspace_addref() somewhere else will not help
> > > > you because there is no guarantees that your uvmspace_addref() is the
> > > > winner. So you need to serialize the uvm_exit() and the
> > > > uvmspace_addref() thread. This means it should be moved back under
> > > > kernel lock.
> > >
> > > Indeed, clearing `ps_vmspace' isn't safe without KERNEL_LOCK(). The
> > > drawback of that approach is that pmap_destroy(), which is costly, will
> > > no longer be executed in parallel.
> > >
> > > > In my initial diff I propose to move uvm_exit(pr) after the kernel
> > > > locked section of reaper(). This mead the vmspace teardown will start
> > > > after the process being unlinek from the allprocess or zombprocess lists
> > > > and not accessed by sysctl(2).
> > >
> > > I'd suggest skipping dereferencing `ps_vmspace' for now to be coherent
> > > with what is done by sysctl_proc_args().
> > >
> > > Accessing zombie process descriptors out of the KERNEL_LOCK() would be a
> > > must for killing the reaper. Anyone interested?
> > >
> > > Diff below is untested, I'm too busy atm.
> > >
> >
> > You do lockless uvm_exit(pr) before setting PS_ZOMBIE flag. This is the
> > case I described below, no guarantees that your uvmspace_addref(vm)
> > thread is the winner.
>
> Please re-read the diff, there's such guarantees thanks to PS_EXITING
> which is set before uvm_purge().
>
So, the PS_EXITING bit is enough? For that purpose do you setting
PS_ZOMBIE?
> This pattern is already present in multiple places in the tree.
>
> > Putting the uvmspace_free(vm) just after the kernel locked section in
> > reaper makes it safe, because at this point the process is not accessible
> > from the alloprocess or zombprocess loops. Also this fixes the
> > sysctl_proc_args(), which has the same issue with unlocked uvm_exit(pr)
> > before setting PS_ZOMBIE flag.
>
> This choice is not helping with unlocking more of the reaper and
> introduces incoherency. This is IMHO the wrong path to follow.
>
> > > Index: kern/kern_sysctl.c
> > > ===================================================================
> > > RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
> > > diff -u -p -r1.482 kern_sysctl.c
> > > --- kern/kern_sysctl.c 6 Aug 2025 14:00:33 -0000 1.482
> > > +++ kern/kern_sysctl.c 12 Aug 2025 10:35:07 -0000
> > > @@ -2051,11 +2051,17 @@ fill_kproc(struct process *pr, struct ki
> > > {
> > > struct session *s = pr->ps_session;
> > > struct tty *tp;
> > > - struct vmspace *vm = pr->ps_vmspace;
> > > + struct vmspace *vm = NULL;
> > > struct timespec booted, st, ut, utc;
> > > struct tusage tu;
> > > int isthread;
> > >
> > > + /* exiting/zombie process might no longer have VM space. */
> > > + if ((pr->ps_flags & (PS_ZOMBIE|PS_EXITING)) == 0) {
> > > + vm = pr->ps_vmspace;
> > > + uvmspace_addref(vm);
> > > + }
> > > +
> > > isthread = p != NULL;
> > > if (!isthread) {
> > > p = pr->ps_mainproc; /* XXX */
> > > @@ -2102,6 +2108,8 @@ fill_kproc(struct process *pr, struct ki
> > > ki->p_cpuid = CPU_INFO_UNIT(p->p_cpu);
> > > #endif
> > > }
> > > +
> > > + uvmspace_free(vm);
> > >
> > > /* get %cpu and schedule state: just one thread or sum of all? */
> > > if (isthread) {
> > >
> > >
> >
>
>
Kernel protection fault in fill_kproc()