From: Vitaliy Makkoveev Subject: Re: [EXT] Re: Kernel protection fault in fill_kproc() To: Gerhard Roth Cc: "dv@sisu.io" , "mpi@openbsd.org" , "tech@openbsd.org" , Carsten Beckmann Date: Mon, 11 Aug 2025 18:42:33 +0300 On Mon, Aug 11, 2025 at 03:07:42PM +0000, Gerhard Roth wrote: > On Mon, 2025-08-11 at 18:00 +0300, Vitaliy Makkoveev wrote: > > On Mon, Aug 11, 2025 at 02:52:40PM +0000, Gerhard Roth wrote: > > > On Mon, 2025-08-11 at 10:34 -0400, Dave Voutila wrote: > > > > Gerhard Roth writes: > > > > > > > > > About a year ago, the call to uvm_exit() was moved outside of the > > > > > KERNEL_LOCK() in the reaper() by mpi@. Now we observed a kernel > > > > > protection fault that results from this change. > > > > > > > > > > In fill_kproc() we read the vmspace pointer (vm) right at the very > > > > > beginning of the function: > > > > > > > > > >         struct vmspace *vm = pr->ps_vmspace; > > > > > > > > > > Sometime later, we try to access it: > > > > > > > > > >         /* fixups that can only be done in the kernel */ > > > > >         if ((pr->ps_flags & PS_ZOMBIE) == 0) { > > > > >                 if ((pr->ps_flags & PS_EMBRYO) == 0 && vm != NULL) > > > > >                         ki->p_vm_rssize = vm_resident_count(vm); > > > > > > > > > > > > > > > In the meantime the process might have exited and the reaper() can free > > > > > the vmspace by calling uvm_exit(). After that, the 'vm' pointer in > > > > > fill_kproc() points to stale memory. Accessing it will yield a kernel > > > > > protection fault. > > > > > > > > > > BTW: only after freeing the vmspace of the process, the PS_ZOMBIE flag > > > > > is set by the reaper(). > > > > > > > > > > I propose to put the reaper()'s call to uvm_exit() back under the > > > > > kernel lock to avoid the fault. > > > > > > > > I don't think this is the correct approach. > > > > > > > > I don't tend to work in this area, but this looks possibly related to > > > > unlocking in sysctl given fill_kproc() is seeing the memory issues. A > > > > lot has changed in kern_sysctl.c in the past few months. > > > > > > fill_kproc() holds the kernel lock while accessing the processe's vmspace > > > while the reaper() doesn't. So it's the unlocking in the reaper() that > > > introduced the problem, not the unlocking in fill_kproc(). > > > > > > > I'm not the fan of moving uvm_exit(pr); back to kernel lock. It seems it > > could be moved this kernel locked section of reaper(). Or the the extra > > reference of the `ps_vmspace' coud be taken in the fill_kproc() path. > > I fully understand that, but no better solution came to my mind. > More than glad, if you could find one! > > Below is a patch that just adds some (huge) delays to the kernel. > With this patch applied it is easy to reproduce the fault. > So if you have an alternate solution, this will help to verify the fix. > > > > > > > > > > > > > > > > > > > > > Gerhard > > > > > > > > > > > > > > > Index: sys/kern/kern_exit.c > > > > > =================================================================== > > > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v > > > > > diff -u -p -u -p -r1.252 kern_exit.c > > > > > --- sys/kern/kern_exit.c        10 Aug 2025 15:17:57 -0000      1.252 > > > > > +++ sys/kern/kern_exit.c        11 Aug 2025 10:30:57 -0000 > > > > > @@ -498,10 +498,15 @@ reaper(void *arg) > > > > >                 } else { > > > > >                         struct process *pr = p->p_p; > > > > > > > > > > -                       /* Release the rest of the process's vmspace */ > > > > > +                       /* > > > > > +                        * Release the rest of the process's vmspace > > > > > +                        * Use the kernel lock to avoid a race with fill_kproc() > > > > > +                        * accessing the vmspace while the process isn't yet a > > > > > +                        * zombie. > > > > > +                        */ > > > > > +                       KERNEL_LOCK(); > > > > >                         uvm_exit(pr); > > > > > > > > > > -                       KERNEL_LOCK(); > > > > >                         if ((pr->ps_flags & PS_NOZOMBIE) == 0) { > > > > >                                 /* Process is now a true zombie. */ > > > > >                                 atomic_setbits_int(&pr->ps_flags, PS_ZOMBIE); > > > > > > > > I propose to do something like below. The corresponding sysctl(2) path is kernel locked, so the reaper() will wait kernel lock release before start process teardown and call uvmspace_free(). The copyout() within sysctl_doproc() will not cause context switch. I didn't test this diff, but it should work. Index: sys/kern/kern_exit.c =================================================================== RCS file: /cvs/src/sys/kern/kern_exit.c,v diff -u -p -r1.251 kern_exit.c --- sys/kern/kern_exit.c 3 Jun 2025 08:38:17 -0000 1.251 +++ sys/kern/kern_exit.c 11 Aug 2025 15:38:06 -0000 @@ -497,9 +497,7 @@ reaper(void *arg) proc_free(p); } else { struct process *pr = p->p_p; - - /* Release the rest of the process's vmspace */ - uvm_exit(pr); + struct vmspace *vm = pr->ps_vmspace; KERNEL_LOCK(); if ((pr->ps_flags & PS_NOZOMBIE) == 0) { @@ -521,6 +519,9 @@ reaper(void *arg) process_zap(pr); } KERNEL_UNLOCK(); + + /* Release the rest of the process's vmspace */ + uvmspace_free(vm); } } }