Index | Thread | Search

From:
Mark Kettenis <mark.kettenis@xs4all.nl>
Subject:
Re: uvm_purge()
To:
Claudio Jeker <cjeker@diehard.n-r-g.com>
Cc:
tech@openbsd.org
Date:
Fri, 16 May 2025 15:34:21 +0200

Download raw body.

Thread
  • Claudio Jeker:

    uvm_purge()

    • Claudio Jeker:

      uvm_purge()

      • Mark Kettenis:

        uvm_purge()

> Date: Fri, 16 May 2025 07:37:26 +0200
> From: Claudio Jeker <cjeker@diehard.n-r-g.com>
> 
> On Thu, May 15, 2025 at 08:54:58PM +0200, Claudio Jeker wrote:
> > On Thu, May 15, 2025 at 02:11:12PM +0200, Claudio Jeker wrote:
> > > On Thu, May 15, 2025 at 01:01:07PM +0200, Martin Pieuchot wrote:
> > > > On 15/05/25(Thu) 11:17, Claudio Jeker wrote:
> > > > > On Mon, May 12, 2025 at 08:42:40PM +0200, Martin Pieuchot wrote:
> > > > > > Diff below moves the tearing down of VM space to exit1().  It implies
> > > > > > that processes will now be charged for cleaning their VM space.  As a
> > > > > > result their %sys time will increase.
> > > > > > 
> > > > > > uvm_purge() is called in the "top" part of exit1() when the process is
> > > > > > still allowed to sleep.  To allow this piece of code to be executed in
> > > > > > parallel, all recursive kernel_lock tokens need to be released. 
> > > > > > 
> > > > > > In the diff below uvm_purge() is called twice.  The first time in
> > > > > > exit1() and the second time, doing nothing but integrity checks, in
> > > > > > uvm_exit().  Another approach could be to just check that the map is
> > > > > > empty.  Any preferences?
> > > > > > 
> > > > > > Comments?  Oks?
> > > > > 
> > > > > Comments inline.
> > > > 
> > > > Thanks for the review.  Answers and new diff below.
> > > > 
> > > > >  
> > > > > > Index: kern/kern_exit.c
> > > > > > ===================================================================
> > > > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v
> > > > > > diff -u -p -r1.245 kern_exit.c
> > > > > > --- kern/kern_exit.c	2 May 2025 05:04:38 -0000	1.245
> > > > > > +++ kern/kern_exit.c	12 May 2025 18:10:35 -0000
> > > > > > @@ -241,6 +241,14 @@ exit1(struct proc *p, int xexit, int xsi
> > > > > >  		 */
> > > > > >  		if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT)
> > > > > >  			atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE);
> > > > > > +
> > > > > > +		/* Teardown the virtual address space. */
> > > > > > +#ifdef MULTIPROCESSOR
> > > > > > +		__mp_release_all(&kernel_lock);
> > > > > > +#endif
> > > > > > +		uvm_purge(pr);
> > > > > > +		KERNEL_LOCK();
> > > > > > +
> > > > > 
> > > > > I prefer if PS_NOZOMBIE is set at the end (after uvm_purge).
> > > > > I know it should not matter but it makes more sense in my head to mark a
> > > > > process with PS_NOZOMBIE close to the end of exit1.
> > > > 
> > > > I agree, such diff is ok mpi@.  I won't mix this refactoring with the
> > > > introduction of uvm_purge() though.
> > > 
> > > Sure, we can do this after this is in.
> > >  
> > > > > > Index: uvm/uvm_glue.c
> > > > > > ===================================================================
> > > > > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
> > > > > > diff -u -p -r1.88 uvm_glue.c
> > > > > > --- uvm/uvm_glue.c	21 Mar 2025 13:19:33 -0000	1.88
> > > > > > +++ uvm/uvm_glue.c	12 May 2025 18:05:34 -0000
> > > > > > @@ -286,6 +286,19 @@ uvm_uarea_free(struct proc *p)
> > > > > >  }
> > > > > >  
> > > > > >  /*
> > > > > > + * uvm_purge: teardown a virtual address space
> > > > > > + */
> > > > > > +void
> > > > > > +uvm_purge(struct process *pr)
> > > > > > +{
> > > > > > +	struct vmspace *vm = pr->ps_vmspace;
> > > > > > +
> > > > > > +	KERNEL_ASSERT_UNLOCKED();
> > > > > > +
> > > > > > +	uvmspace_purge(vm);
> > > > > > +}
> > > > > 
> > > > > I think here you miss a check. Neither uvm_purge() nor uvmspace_purge()
> > > > > check the vmspace refcount. In my diff I used this here:
> > > > > 
> > > > > 	if (atomic_load_int(&vm->vm_refcnt) != 1)
> > > > > 		return;
> > > > > 
> > > > > At least kthread_exit() calls exit1() with a shared vm_refcnt of proc0
> > > > > and without that check a kthread_exit() would purge the kernel vmspace
> > > > > which is very bad.
> > > > 
> > > > Indeed.  I forgot about kernel threads exiting.  I deliberately decided
> > > > to not check for refcounts for userland processes.   See below.
> > > > 
> > > > > The other places, where the vm_refcnt is increased, are in process_domem()
> > > > > (ptrace) and in sysctl_proc_args() both check PS_EXITING upfront but then
> > > > > sleep while holding the reference. So if the process exits while those are
> > > > > asleep we also end up in fireworks. For those parts I'm not even sure if
> > > > > going through exit1, reaper and process_zap before those call uvmspace_free()
> > > > > would not result in some use-after-free or other bad access.
> > > > > I try to look at all the logic there and think about the consequences when
> > > > > it comes to vmspace references.
> > > > 
> > > > When a process wants to peak at the VM space of a dying one, the
> > > > vm_map_lock() is what guarantees integrity of the data structures.
> > > > So in the worst case the VM space will appear empty.
> > > 
> > > So that would then result in a failed uvm_io operation. I guess that is
> > > fine.
> > >  
> > > > I decided to not care about reference counts for userland processes
> > > > because I want to be sure the process itself will be charged for
> > > > cleaning its VM space.  This can be seen as a huge munmap(2) of the
> > > > whole address space.
> > > 
> > > I agree with that. We want the process to do the work in exit1 and not
> > > have some other thread do this on behalf.
> > > 
> > > > The reference protects the 'struct vmspace' which needs to stay around
> > > > for the reasons mentioned above.
> > > 
> > > I am more worried that some code in this exit1, reaper, process_zap jungle
> > > would change a value inside struct vmspace that would break the assumption
> > > you made about the map lock. I had a look and did not spot anything right
> > > away.
> > >  
> > > > For that reasons updated diff below skips uvmspace_purge() in exit1() for
> > > > kernel threads.
> > > 
> > > This is fine with me.
> > >  
> > > > Done.  I also moved cleanups needed for VMMAP_DEBUG builds where they belong.
> > > 
> > > Some bikeshed requests below. Apart from that OK claudio@
> > > I will throw it onto some systems to test.
> > 
> > amd64 did a make build and make regress and all is ok.
> > sparc64 make build still running but this looks good as well.
> > arm64, hit some problems on my first try, 2nd make build is now started.
> > Lets see.
> 
> I am not able to make build on my arm64 with the uvm_purge diff. I think
> there is some bad interaction with what kettenis did in pmap. A few
> minutes in the build the system locks up hard.

I doubt it.  The uvm_purge diff makes the optimization I did
ineffective so in a sense it makes pmap behave more like it did in the
past (i.e. doing all the TLB flushes it used to do).

> So not sure how to progress here. I should probably try kettenis's diff on
> top of this and see.

If the hangs didn't happen before, or are more frequent now, we're
probably hitting bugs (in uvm or pmap) due to having more parallelism.