From: Claudio Jeker Subject: Re: uvm_purge() To: tech@openbsd.org Date: Thu, 15 May 2025 20:54:58 +0200 On Thu, May 15, 2025 at 02:11:12PM +0200, Claudio Jeker wrote: > On Thu, May 15, 2025 at 01:01:07PM +0200, Martin Pieuchot wrote: > > On 15/05/25(Thu) 11:17, Claudio Jeker wrote: > > > On Mon, May 12, 2025 at 08:42:40PM +0200, Martin Pieuchot wrote: > > > > Diff below moves the tearing down of VM space to exit1(). It implies > > > > that processes will now be charged for cleaning their VM space. As a > > > > result their %sys time will increase. > > > > > > > > uvm_purge() is called in the "top" part of exit1() when the process is > > > > still allowed to sleep. To allow this piece of code to be executed in > > > > parallel, all recursive kernel_lock tokens need to be released. > > > > > > > > In the diff below uvm_purge() is called twice. The first time in > > > > exit1() and the second time, doing nothing but integrity checks, in > > > > uvm_exit(). Another approach could be to just check that the map is > > > > empty. Any preferences? > > > > > > > > Comments? Oks? > > > > > > Comments inline. > > > > Thanks for the review. Answers and new diff below. > > > > > > > > > Index: kern/kern_exit.c > > > > =================================================================== > > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v > > > > diff -u -p -r1.245 kern_exit.c > > > > --- kern/kern_exit.c 2 May 2025 05:04:38 -0000 1.245 > > > > +++ kern/kern_exit.c 12 May 2025 18:10:35 -0000 > > > > @@ -241,6 +241,14 @@ exit1(struct proc *p, int xexit, int xsi > > > > */ > > > > if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT) > > > > atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE); > > > > + > > > > + /* Teardown the virtual address space. */ > > > > +#ifdef MULTIPROCESSOR > > > > + __mp_release_all(&kernel_lock); > > > > +#endif > > > > + uvm_purge(pr); > > > > + KERNEL_LOCK(); > > > > + > > > > > > I prefer if PS_NOZOMBIE is set at the end (after uvm_purge). > > > I know it should not matter but it makes more sense in my head to mark a > > > process with PS_NOZOMBIE close to the end of exit1. > > > > I agree, such diff is ok mpi@. I won't mix this refactoring with the > > introduction of uvm_purge() though. > > Sure, we can do this after this is in. > > > > > Index: uvm/uvm_glue.c > > > > =================================================================== > > > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v > > > > diff -u -p -r1.88 uvm_glue.c > > > > --- uvm/uvm_glue.c 21 Mar 2025 13:19:33 -0000 1.88 > > > > +++ uvm/uvm_glue.c 12 May 2025 18:05:34 -0000 > > > > @@ -286,6 +286,19 @@ uvm_uarea_free(struct proc *p) > > > > } > > > > > > > > /* > > > > + * uvm_purge: teardown a virtual address space > > > > + */ > > > > +void > > > > +uvm_purge(struct process *pr) > > > > +{ > > > > + struct vmspace *vm = pr->ps_vmspace; > > > > + > > > > + KERNEL_ASSERT_UNLOCKED(); > > > > + > > > > + uvmspace_purge(vm); > > > > +} > > > > > > I think here you miss a check. Neither uvm_purge() nor uvmspace_purge() > > > check the vmspace refcount. In my diff I used this here: > > > > > > if (atomic_load_int(&vm->vm_refcnt) != 1) > > > return; > > > > > > At least kthread_exit() calls exit1() with a shared vm_refcnt of proc0 > > > and without that check a kthread_exit() would purge the kernel vmspace > > > which is very bad. > > > > Indeed. I forgot about kernel threads exiting. I deliberately decided > > to not check for refcounts for userland processes. See below. > > > > > The other places, where the vm_refcnt is increased, are in process_domem() > > > (ptrace) and in sysctl_proc_args() both check PS_EXITING upfront but then > > > sleep while holding the reference. So if the process exits while those are > > > asleep we also end up in fireworks. For those parts I'm not even sure if > > > going through exit1, reaper and process_zap before those call uvmspace_free() > > > would not result in some use-after-free or other bad access. > > > I try to look at all the logic there and think about the consequences when > > > it comes to vmspace references. > > > > When a process wants to peak at the VM space of a dying one, the > > vm_map_lock() is what guarantees integrity of the data structures. > > So in the worst case the VM space will appear empty. > > So that would then result in a failed uvm_io operation. I guess that is > fine. > > > I decided to not care about reference counts for userland processes > > because I want to be sure the process itself will be charged for > > cleaning its VM space. This can be seen as a huge munmap(2) of the > > whole address space. > > I agree with that. We want the process to do the work in exit1 and not > have some other thread do this on behalf. > > > The reference protects the 'struct vmspace' which needs to stay around > > for the reasons mentioned above. > > I am more worried that some code in this exit1, reaper, process_zap jungle > would change a value inside struct vmspace that would break the assumption > you made about the map lock. I had a look and did not spot anything right > away. > > > For that reasons updated diff below skips uvmspace_purge() in exit1() for > > kernel threads. > > This is fine with me. > > > Done. I also moved cleanups needed for VMMAP_DEBUG builds where they belong. > > Some bikeshed requests below. Apart from that OK claudio@ > I will throw it onto some systems to test. amd64 did a make build and make regress and all is ok. sparc64 make build still running but this looks good as well. arm64, hit some problems on my first try, 2nd make build is now started. Lets see. -- :wq Claudio