Index | Thread | Search

From:
Vitaliy Makkoveev <mvs@openbsd.org>
Subject:
Re: [EXT] Re: Kernel protection fault in fill_kproc()
To:
Gerhard Roth <gerhard_roth@genua.de>
Cc:
"dv@sisu.io" <dv@sisu.io>, "mpi@openbsd.org" <mpi@openbsd.org>, "tech@openbsd.org" <tech@openbsd.org>, Carsten Beckmann <carsten_beckmann@genua.de>
Date:
Mon, 11 Aug 2025 18:42:33 +0300

Download raw body.

Thread
On Mon, Aug 11, 2025 at 03:07:42PM +0000, Gerhard Roth wrote:
> On Mon, 2025-08-11 at 18:00 +0300, Vitaliy Makkoveev wrote:
> > On Mon, Aug 11, 2025 at 02:52:40PM +0000, Gerhard Roth wrote:
> > > On Mon, 2025-08-11 at 10:34 -0400, Dave Voutila wrote:
> > > > Gerhard Roth <gerhard_roth@genua.de> writes:
> > > > 
> > > > > About a year ago, the call to uvm_exit() was moved outside of the
> > > > > KERNEL_LOCK() in the reaper() by mpi@. Now we observed a kernel
> > > > > protection fault that results from this change.
> > > > > 
> > > > > In fill_kproc() we read the vmspace pointer (vm) right at the very
> > > > > beginning of the function:
> > > > > 
> > > > >         struct vmspace *vm = pr->ps_vmspace;
> > > > > 
> > > > > Sometime later, we try to access it:
> > > > > 
> > > > >         /* fixups that can only be done in the kernel */
> > > > >         if ((pr->ps_flags & PS_ZOMBIE) == 0) {
> > > > >                 if ((pr->ps_flags & PS_EMBRYO) == 0 && vm != NULL)
> > > > >                         ki->p_vm_rssize = vm_resident_count(vm);
> > > > > 
> > > > > 
> > > > > In the meantime the process might have exited and the reaper() can free
> > > > > the vmspace by calling uvm_exit(). After that, the 'vm' pointer in
> > > > > fill_kproc() points to stale memory. Accessing it will yield a kernel
> > > > > protection fault.
> > > > > 
> > > > > BTW: only after freeing the vmspace of the process, the PS_ZOMBIE flag
> > > > > is set by the reaper().
> > > > > 
> > > > > I propose to put the reaper()'s call to uvm_exit() back under the
> > > > > kernel lock to avoid the fault.
> > > > 
> > > > I don't think this is the correct approach.
> > > > 
> > > > I don't tend to work in this area, but this looks possibly related to
> > > > unlocking in sysctl given fill_kproc() is seeing the memory issues. A
> > > > lot has changed in kern_sysctl.c in the past few months.
> > > 
> > > fill_kproc() holds the kernel lock while accessing the processe's vmspace
> > > while the reaper() doesn't. So it's the unlocking in the reaper() that
> > > introduced the problem, not the unlocking in fill_kproc().
> > > 
> > 
> > I'm not the fan of moving uvm_exit(pr); back to kernel lock. It seems it
> > could be moved this kernel locked section of reaper(). Or the the extra
> > reference of the `ps_vmspace' coud be taken in the fill_kproc() path.
> 
> I fully understand that, but no better solution came to my mind.
> More than glad, if you could find one!
> 
> Below is a patch that just adds some (huge) delays to the kernel.
> With this patch applied it is easy to reproduce the fault.
> So if you have an alternate solution, this will help to verify the fix.
> 
> 
> > 
> > > 
> > > > 
> > > > > 
> > > > > Gerhard
> > > > > 
> > > > > 
> > > > > Index: sys/kern/kern_exit.c
> > > > > ===================================================================
> > > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v
> > > > > diff -u -p -u -p -r1.252 kern_exit.c
> > > > > --- sys/kern/kern_exit.c        10 Aug 2025 15:17:57 -0000      1.252
> > > > > +++ sys/kern/kern_exit.c        11 Aug 2025 10:30:57 -0000
> > > > > @@ -498,10 +498,15 @@ reaper(void *arg)
> > > > >                 } else {
> > > > >                         struct process *pr = p->p_p;
> > > > > 
> > > > > -                       /* Release the rest of the process's vmspace */
> > > > > +                       /*
> > > > > +                        * Release the rest of the process's vmspace
> > > > > +                        * Use the kernel lock to avoid a race with fill_kproc()
> > > > > +                        * accessing the vmspace while the process isn't yet a
> > > > > +                        * zombie.
> > > > > +                        */
> > > > > +                       KERNEL_LOCK();
> > > > >                         uvm_exit(pr);
> > > > > 
> > > > > -                       KERNEL_LOCK();
> > > > >                         if ((pr->ps_flags & PS_NOZOMBIE) == 0) {
> > > > >                                 /* Process is now a true zombie. */
> > > > >                                 atomic_setbits_int(&pr->ps_flags, PS_ZOMBIE);
> > > 
> > 
> > 
> 

I propose to do something like below. The corresponding sysctl(2) path
is kernel locked, so the reaper() will wait kernel lock release before
start process teardown and call uvmspace_free(). The copyout() within
sysctl_doproc() will not cause context switch. I didn't test this diff,
but it should work.

Index: sys/kern/kern_exit.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_exit.c,v
diff -u -p -r1.251 kern_exit.c
--- sys/kern/kern_exit.c	3 Jun 2025 08:38:17 -0000	1.251
+++ sys/kern/kern_exit.c	11 Aug 2025 15:38:06 -0000
@@ -497,9 +497,7 @@ reaper(void *arg)
 			proc_free(p);
 		} else {
 			struct process *pr = p->p_p;
-
-			/* Release the rest of the process's vmspace */
-			uvm_exit(pr);
+			struct vmspace *vm = pr->ps_vmspace;
 
 			KERNEL_LOCK();
 			if ((pr->ps_flags & PS_NOZOMBIE) == 0) {
@@ -521,6 +519,9 @@ reaper(void *arg)
 				process_zap(pr);
 			}
 			KERNEL_UNLOCK();
+
+			/* Release the rest of the process's vmspace */
+			uvmspace_free(vm);
 		}
 	}
 }