From: Vitaliy Makkoveev Subject: Re: Kernel protection fault in fill_kproc() To: Mark Kettenis Cc: gerhard_roth@genua.de, tech@openbsd.org, mpi@openbsd.org, carsten_beckmann@genua.de Date: Tue, 12 Aug 2025 13:00:02 +0300 On Tue, Aug 12, 2025 at 11:49:12AM +0200, Mark Kettenis wrote: > > Date: Tue, 12 Aug 2025 11:56:10 +0300 > > From: Vitaliy Makkoveev > > > > On Tue, Aug 12, 2025 at 07:22:29AM +0200, Claudio Jeker wrote: > > > On Mon, Aug 11, 2025 at 10:45:05AM +0000, Gerhard Roth wrote: > > > > About a year ago, the call to uvm_exit() was moved outside of theĀ  > > > > KERNEL_LOCK() in the reaper() by mpi@. Now we observed a kernel > > > > protection fault that results from this change. > > > > > > > > In fill_kproc() we read the vmspace pointer (vm) right at the very > > > > beginning of the function: > > > > > > > > struct vmspace *vm = pr->ps_vmspace; > > > > > > > > Sometime later, we try to access it: > > > > > > > > /* fixups that can only be done in the kernel */ > > > > if ((pr->ps_flags & PS_ZOMBIE) == 0) { > > > > if ((pr->ps_flags & PS_EMBRYO) == 0 && vm != NULL) > > > > ki->p_vm_rssize = vm_resident_count(vm); > > > > > > > > > > > > In the meantime the process might have exited and the reaper() can free > > > > the vmspace by calling uvm_exit(). After that, the 'vm' pointer in > > > > fill_kproc() points to stale memory. Accessing it will yield a kernel > > > > protection fault. > > > > > > > > BTW: only after freeing the vmspace of the process, the PS_ZOMBIE flag > > > > is set by the reaper(). > > > > > > > > I propose to put the reaper()'s call to uvm_exit() back under the > > > > kernel lock to avoid the fault. > > > > > > In my opinion the fill_kproc() code is wrong and it should not look at > > > pr->ps_vmspace if the PS_EXITING flag is set for the process. > > > > > > exit1() sets PS_EXITING flag early on and after that point the vm can be > > > purged so the vm_resident_count() is probably wrong anyway. > > I guess that is safe since exit1() still runs with the kernel lock > held. Which means that the PS_EXITING flags can't be set while > fill_kproc() runs, since it holds the kernel lock. > > > The only fill_kproc() is the sysctl_doproc() which does the check in the > > beginning of the allprocess loop: > > > > for (; pr != NULL; pr = LIST_NEXT(pr, ps_list)) { > > /* XXX skip processes in the middle of being zapped */ > > if (pr->ps_pgrp == NULL) > > continue; > > There is some other code where an "external observer" looks at > ps_vmspace in kern/sys_process.c:process_domem(): > > vm = tr->ps_vmspace; > if ((tr->ps_flags & PS_EXITING) || (vm->vm_refcnt < 1)) > return EFAULT; > addr = uio->uio_offset; > > uvmspace_addref(vm); > > error = uvm_io(&vm->vm_map, uio, UVM_IO_FIXPROT); > > uvmspace_free(vm); > > So that checks PS_EXITING as well, but also checks the refcnt. > > As you can see this also takes a reference of the vmspace. I guess > that's necessary since uvm_io() may sleep. The problem lies in the unlocked uvm_exit(pr) which starts teardown of vmspace. Simple adding uvmspace_addref() somewhere else will not help you because there is no guarantees that your uvmspace_addref() is the winner. So you need to serialize the uvm_exit() and the uvmspace_addref() thread. This means it should be moved back under kernel lock. In my initial diff I propose to move uvm_exit(pr) after the kernel locked section of reaper(). This mead the vmspace teardown will start after the process being unlinek from the allprocess or zombprocess lists and not accessed by sysctl(2).