Download raw body.
uvm_purge()
> Date: Fri, 16 May 2025 07:37:26 +0200
> From: Claudio Jeker <cjeker@diehard.n-r-g.com>
>
> On Thu, May 15, 2025 at 08:54:58PM +0200, Claudio Jeker wrote:
> > On Thu, May 15, 2025 at 02:11:12PM +0200, Claudio Jeker wrote:
> > > On Thu, May 15, 2025 at 01:01:07PM +0200, Martin Pieuchot wrote:
> > > > On 15/05/25(Thu) 11:17, Claudio Jeker wrote:
> > > > > On Mon, May 12, 2025 at 08:42:40PM +0200, Martin Pieuchot wrote:
> > > > > > Diff below moves the tearing down of VM space to exit1(). It implies
> > > > > > that processes will now be charged for cleaning their VM space. As a
> > > > > > result their %sys time will increase.
> > > > > >
> > > > > > uvm_purge() is called in the "top" part of exit1() when the process is
> > > > > > still allowed to sleep. To allow this piece of code to be executed in
> > > > > > parallel, all recursive kernel_lock tokens need to be released.
> > > > > >
> > > > > > In the diff below uvm_purge() is called twice. The first time in
> > > > > > exit1() and the second time, doing nothing but integrity checks, in
> > > > > > uvm_exit(). Another approach could be to just check that the map is
> > > > > > empty. Any preferences?
> > > > > >
> > > > > > Comments? Oks?
> > > > >
> > > > > Comments inline.
> > > >
> > > > Thanks for the review. Answers and new diff below.
> > > >
> > > > >
> > > > > > Index: kern/kern_exit.c
> > > > > > ===================================================================
> > > > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v
> > > > > > diff -u -p -r1.245 kern_exit.c
> > > > > > --- kern/kern_exit.c 2 May 2025 05:04:38 -0000 1.245
> > > > > > +++ kern/kern_exit.c 12 May 2025 18:10:35 -0000
> > > > > > @@ -241,6 +241,14 @@ exit1(struct proc *p, int xexit, int xsi
> > > > > > */
> > > > > > if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT)
> > > > > > atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE);
> > > > > > +
> > > > > > + /* Teardown the virtual address space. */
> > > > > > +#ifdef MULTIPROCESSOR
> > > > > > + __mp_release_all(&kernel_lock);
> > > > > > +#endif
> > > > > > + uvm_purge(pr);
> > > > > > + KERNEL_LOCK();
> > > > > > +
> > > > >
> > > > > I prefer if PS_NOZOMBIE is set at the end (after uvm_purge).
> > > > > I know it should not matter but it makes more sense in my head to mark a
> > > > > process with PS_NOZOMBIE close to the end of exit1.
> > > >
> > > > I agree, such diff is ok mpi@. I won't mix this refactoring with the
> > > > introduction of uvm_purge() though.
> > >
> > > Sure, we can do this after this is in.
> > >
> > > > > > Index: uvm/uvm_glue.c
> > > > > > ===================================================================
> > > > > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
> > > > > > diff -u -p -r1.88 uvm_glue.c
> > > > > > --- uvm/uvm_glue.c 21 Mar 2025 13:19:33 -0000 1.88
> > > > > > +++ uvm/uvm_glue.c 12 May 2025 18:05:34 -0000
> > > > > > @@ -286,6 +286,19 @@ uvm_uarea_free(struct proc *p)
> > > > > > }
> > > > > >
> > > > > > /*
> > > > > > + * uvm_purge: teardown a virtual address space
> > > > > > + */
> > > > > > +void
> > > > > > +uvm_purge(struct process *pr)
> > > > > > +{
> > > > > > + struct vmspace *vm = pr->ps_vmspace;
> > > > > > +
> > > > > > + KERNEL_ASSERT_UNLOCKED();
> > > > > > +
> > > > > > + uvmspace_purge(vm);
> > > > > > +}
> > > > >
> > > > > I think here you miss a check. Neither uvm_purge() nor uvmspace_purge()
> > > > > check the vmspace refcount. In my diff I used this here:
> > > > >
> > > > > if (atomic_load_int(&vm->vm_refcnt) != 1)
> > > > > return;
> > > > >
> > > > > At least kthread_exit() calls exit1() with a shared vm_refcnt of proc0
> > > > > and without that check a kthread_exit() would purge the kernel vmspace
> > > > > which is very bad.
> > > >
> > > > Indeed. I forgot about kernel threads exiting. I deliberately decided
> > > > to not check for refcounts for userland processes. See below.
> > > >
> > > > > The other places, where the vm_refcnt is increased, are in process_domem()
> > > > > (ptrace) and in sysctl_proc_args() both check PS_EXITING upfront but then
> > > > > sleep while holding the reference. So if the process exits while those are
> > > > > asleep we also end up in fireworks. For those parts I'm not even sure if
> > > > > going through exit1, reaper and process_zap before those call uvmspace_free()
> > > > > would not result in some use-after-free or other bad access.
> > > > > I try to look at all the logic there and think about the consequences when
> > > > > it comes to vmspace references.
> > > >
> > > > When a process wants to peak at the VM space of a dying one, the
> > > > vm_map_lock() is what guarantees integrity of the data structures.
> > > > So in the worst case the VM space will appear empty.
> > >
> > > So that would then result in a failed uvm_io operation. I guess that is
> > > fine.
> > >
> > > > I decided to not care about reference counts for userland processes
> > > > because I want to be sure the process itself will be charged for
> > > > cleaning its VM space. This can be seen as a huge munmap(2) of the
> > > > whole address space.
> > >
> > > I agree with that. We want the process to do the work in exit1 and not
> > > have some other thread do this on behalf.
> > >
> > > > The reference protects the 'struct vmspace' which needs to stay around
> > > > for the reasons mentioned above.
> > >
> > > I am more worried that some code in this exit1, reaper, process_zap jungle
> > > would change a value inside struct vmspace that would break the assumption
> > > you made about the map lock. I had a look and did not spot anything right
> > > away.
> > >
> > > > For that reasons updated diff below skips uvmspace_purge() in exit1() for
> > > > kernel threads.
> > >
> > > This is fine with me.
> > >
> > > > Done. I also moved cleanups needed for VMMAP_DEBUG builds where they belong.
> > >
> > > Some bikeshed requests below. Apart from that OK claudio@
> > > I will throw it onto some systems to test.
> >
> > amd64 did a make build and make regress and all is ok.
> > sparc64 make build still running but this looks good as well.
> > arm64, hit some problems on my first try, 2nd make build is now started.
> > Lets see.
>
> I am not able to make build on my arm64 with the uvm_purge diff. I think
> there is some bad interaction with what kettenis did in pmap. A few
> minutes in the build the system locks up hard.
I doubt it. The uvm_purge diff makes the optimization I did
ineffective so in a sense it makes pmap behave more like it did in the
past (i.e. doing all the TLB flushes it used to do).
> So not sure how to progress here. I should probably try kettenis's diff on
> top of this and see.
If the hangs didn't happen before, or are more frequent now, we're
probably hitting bugs (in uvm or pmap) due to having more parallelism.
uvm_purge()