Index | Thread | Search

From:
Gerhard Roth <gerhard_roth@genua.de>
Subject:
Re: [EXT] Re: Kernel protection fault in fill_kproc()
To:
"mvs@openbsd.org" <mvs@openbsd.org>
Cc:
"dv@sisu.io" <dv@sisu.io>, "mpi@openbsd.org" <mpi@openbsd.org>, "tech@openbsd.org" <tech@openbsd.org>, Carsten Beckmann <carsten_beckmann@genua.de>
Date:
Mon, 11 Aug 2025 15:07:42 +0000

Download raw body.

Thread
On Mon, 2025-08-11 at 18:00 +0300, Vitaliy Makkoveev wrote:
> On Mon, Aug 11, 2025 at 02:52:40PM +0000, Gerhard Roth wrote:
> > On Mon, 2025-08-11 at 10:34 -0400, Dave Voutila wrote:
> > > Gerhard Roth <gerhard_roth@genua.de> writes:
> > > 
> > > > About a year ago, the call to uvm_exit() was moved outside of the
> > > > KERNEL_LOCK() in the reaper() by mpi@. Now we observed a kernel
> > > > protection fault that results from this change.
> > > > 
> > > > In fill_kproc() we read the vmspace pointer (vm) right at the very
> > > > beginning of the function:
> > > > 
> > > >         struct vmspace *vm = pr->ps_vmspace;
> > > > 
> > > > Sometime later, we try to access it:
> > > > 
> > > >         /* fixups that can only be done in the kernel */
> > > >         if ((pr->ps_flags & PS_ZOMBIE) == 0) {
> > > >                 if ((pr->ps_flags & PS_EMBRYO) == 0 && vm != NULL)
> > > >                         ki->p_vm_rssize = vm_resident_count(vm);
> > > > 
> > > > 
> > > > In the meantime the process might have exited and the reaper() can free
> > > > the vmspace by calling uvm_exit(). After that, the 'vm' pointer in
> > > > fill_kproc() points to stale memory. Accessing it will yield a kernel
> > > > protection fault.
> > > > 
> > > > BTW: only after freeing the vmspace of the process, the PS_ZOMBIE flag
> > > > is set by the reaper().
> > > > 
> > > > I propose to put the reaper()'s call to uvm_exit() back under the
> > > > kernel lock to avoid the fault.
> > > 
> > > I don't think this is the correct approach.
> > > 
> > > I don't tend to work in this area, but this looks possibly related to
> > > unlocking in sysctl given fill_kproc() is seeing the memory issues. A
> > > lot has changed in kern_sysctl.c in the past few months.
> > 
> > fill_kproc() holds the kernel lock while accessing the processe's vmspace
> > while the reaper() doesn't. So it's the unlocking in the reaper() that
> > introduced the problem, not the unlocking in fill_kproc().
> > 
> 
> I'm not the fan of moving uvm_exit(pr); back to kernel lock. It seems it
> could be moved this kernel locked section of reaper(). Or the the extra
> reference of the `ps_vmspace' coud be taken in the fill_kproc() path.

I fully understand that, but no better solution came to my mind.
More than glad, if you could find one!

Below is a patch that just adds some (huge) delays to the kernel.
With this patch applied it is easy to reproduce the fault.
So if you have an alternate solution, this will help to verify the fix.


> 
> > 
> > > 
> > > > 
> > > > Gerhard
> > > > 
> > > > 
> > > > Index: sys/kern/kern_exit.c
> > > > ===================================================================
> > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v
> > > > diff -u -p -u -p -r1.252 kern_exit.c
> > > > --- sys/kern/kern_exit.c        10 Aug 2025 15:17:57 -0000      1.252
> > > > +++ sys/kern/kern_exit.c        11 Aug 2025 10:30:57 -0000
> > > > @@ -498,10 +498,15 @@ reaper(void *arg)
> > > >                 } else {
> > > >                         struct process *pr = p->p_p;
> > > > 
> > > > -                       /* Release the rest of the process's vmspace */
> > > > +                       /*
> > > > +                        * Release the rest of the process's vmspace
> > > > +                        * Use the kernel lock to avoid a race with fill_kproc()
> > > > +                        * accessing the vmspace while the process isn't yet a
> > > > +                        * zombie.
> > > > +                        */
> > > > +                       KERNEL_LOCK();
> > > >                         uvm_exit(pr);
> > > > 
> > > > -                       KERNEL_LOCK();
> > > >                         if ((pr->ps_flags & PS_NOZOMBIE) == 0) {
> > > >                                 /* Process is now a true zombie. */
> > > >                                 atomic_setbits_int(&pr->ps_flags, PS_ZOMBIE);
> > 
> 
> 

Index: sys/kern/kern_exit.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_exit.c,v
diff -u -p -u -p -r1.251 kern_exit.c
--- sys/kern/kern_exit.c	3 Jun 2025 08:38:17 -0000	1.251
+++ sys/kern/kern_exit.c	7 Aug 2025 08:04:54 -0000
@@ -499,6 +499,7 @@ reaper(void *arg)
 			struct process *pr = p->p_p;
 
 			/* Release the rest of the process's vmspace */
+			delay(200 * 1000);
 			uvm_exit(pr);
 
 			KERNEL_LOCK();
Index: sys/kern/kern_sysctl.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
diff -u -p -u -p -r1.482 kern_sysctl.c
--- sys/kern/kern_sysctl.c	6 Aug 2025 14:00:33 -0000	1.482
+++ sys/kern/kern_sysctl.c	7 Aug 2025 08:04:54 -0000
@@ -159,7 +159,7 @@ int sysctl_hwbattery(int *, u_int, void 
 
 void fill_file(struct kinfo_file *, struct file *, struct filedesc *, int,
     struct vnode *, struct process *, struct proc *, struct socket *, int);
-void fill_kproc(struct process *, struct kinfo_proc *, struct proc *, int);
+void fill_kproc(struct process *, struct kinfo_proc *, struct proc *, int, int);
 
 int kern_sysctl_locked(int *, u_int, void *, size_t *, void *, size_t,
 	struct proc *);
@@ -1994,7 +1994,7 @@ again:
 		}
 
 		if (buflen >= elem_size && elem_count > 0) {
-			fill_kproc(pr, kproc, NULL, show_pointers);
+			fill_kproc(pr, kproc, NULL, show_pointers, doingzomb);
 			error = copyout(kproc, dp, elem_size);
 			if (error)
 				goto err;
@@ -2010,7 +2010,7 @@ again:
 
 		TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) {
 			if (buflen >= elem_size && elem_count > 0) {
-				fill_kproc(pr, kproc, p, show_pointers);
+				fill_kproc(pr, kproc, p, show_pointers, doingzomb);
 				error = copyout(kproc, dp, elem_size);
 				if (error)
 					goto err;
@@ -2047,7 +2047,7 @@ err:
  */
 void
 fill_kproc(struct process *pr, struct kinfo_proc *ki, struct proc *p,
-    int show_pointers)
+    int show_pointers, int doingzomb)
 {
 	struct session *s = pr->ps_session;
 	struct tty *tp;
@@ -2083,6 +2083,8 @@ fill_kproc(struct process *pr, struct ki
 
 	/* fixups that can only be done in the kernel */
 	if ((pr->ps_flags & PS_ZOMBIE) == 0) {
+		if (doingzomb)
+			delay(1000 * 1000);
 		if ((pr->ps_flags & PS_EMBRYO) == 0 && vm != NULL)
 			ki->p_vm_rssize = vm_resident_count(vm);
 		calctsru(&tu, &ut, &st, NULL);