From: Claudio Jeker Subject: Re: uvm_purge() To: tech@openbsd.org Date: Fri, 16 May 2025 07:37:26 +0200 On Thu, May 15, 2025 at 08:54:58PM +0200, Claudio Jeker wrote: > On Thu, May 15, 2025 at 02:11:12PM +0200, Claudio Jeker wrote: > > On Thu, May 15, 2025 at 01:01:07PM +0200, Martin Pieuchot wrote: > > > On 15/05/25(Thu) 11:17, Claudio Jeker wrote: > > > > On Mon, May 12, 2025 at 08:42:40PM +0200, Martin Pieuchot wrote: > > > > > Diff below moves the tearing down of VM space to exit1(). It implies > > > > > that processes will now be charged for cleaning their VM space. As a > > > > > result their %sys time will increase. > > > > > > > > > > uvm_purge() is called in the "top" part of exit1() when the process is > > > > > still allowed to sleep. To allow this piece of code to be executed in > > > > > parallel, all recursive kernel_lock tokens need to be released. > > > > > > > > > > In the diff below uvm_purge() is called twice. The first time in > > > > > exit1() and the second time, doing nothing but integrity checks, in > > > > > uvm_exit(). Another approach could be to just check that the map is > > > > > empty. Any preferences? > > > > > > > > > > Comments? Oks? > > > > > > > > Comments inline. > > > > > > Thanks for the review. Answers and new diff below. > > > > > > > > > > > > Index: kern/kern_exit.c > > > > > =================================================================== > > > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v > > > > > diff -u -p -r1.245 kern_exit.c > > > > > --- kern/kern_exit.c 2 May 2025 05:04:38 -0000 1.245 > > > > > +++ kern/kern_exit.c 12 May 2025 18:10:35 -0000 > > > > > @@ -241,6 +241,14 @@ exit1(struct proc *p, int xexit, int xsi > > > > > */ > > > > > if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT) > > > > > atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE); > > > > > + > > > > > + /* Teardown the virtual address space. */ > > > > > +#ifdef MULTIPROCESSOR > > > > > + __mp_release_all(&kernel_lock); > > > > > +#endif > > > > > + uvm_purge(pr); > > > > > + KERNEL_LOCK(); > > > > > + > > > > > > > > I prefer if PS_NOZOMBIE is set at the end (after uvm_purge). > > > > I know it should not matter but it makes more sense in my head to mark a > > > > process with PS_NOZOMBIE close to the end of exit1. > > > > > > I agree, such diff is ok mpi@. I won't mix this refactoring with the > > > introduction of uvm_purge() though. > > > > Sure, we can do this after this is in. > > > > > > > Index: uvm/uvm_glue.c > > > > > =================================================================== > > > > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v > > > > > diff -u -p -r1.88 uvm_glue.c > > > > > --- uvm/uvm_glue.c 21 Mar 2025 13:19:33 -0000 1.88 > > > > > +++ uvm/uvm_glue.c 12 May 2025 18:05:34 -0000 > > > > > @@ -286,6 +286,19 @@ uvm_uarea_free(struct proc *p) > > > > > } > > > > > > > > > > /* > > > > > + * uvm_purge: teardown a virtual address space > > > > > + */ > > > > > +void > > > > > +uvm_purge(struct process *pr) > > > > > +{ > > > > > + struct vmspace *vm = pr->ps_vmspace; > > > > > + > > > > > + KERNEL_ASSERT_UNLOCKED(); > > > > > + > > > > > + uvmspace_purge(vm); > > > > > +} > > > > > > > > I think here you miss a check. Neither uvm_purge() nor uvmspace_purge() > > > > check the vmspace refcount. In my diff I used this here: > > > > > > > > if (atomic_load_int(&vm->vm_refcnt) != 1) > > > > return; > > > > > > > > At least kthread_exit() calls exit1() with a shared vm_refcnt of proc0 > > > > and without that check a kthread_exit() would purge the kernel vmspace > > > > which is very bad. > > > > > > Indeed. I forgot about kernel threads exiting. I deliberately decided > > > to not check for refcounts for userland processes. See below. > > > > > > > The other places, where the vm_refcnt is increased, are in process_domem() > > > > (ptrace) and in sysctl_proc_args() both check PS_EXITING upfront but then > > > > sleep while holding the reference. So if the process exits while those are > > > > asleep we also end up in fireworks. For those parts I'm not even sure if > > > > going through exit1, reaper and process_zap before those call uvmspace_free() > > > > would not result in some use-after-free or other bad access. > > > > I try to look at all the logic there and think about the consequences when > > > > it comes to vmspace references. > > > > > > When a process wants to peak at the VM space of a dying one, the > > > vm_map_lock() is what guarantees integrity of the data structures. > > > So in the worst case the VM space will appear empty. > > > > So that would then result in a failed uvm_io operation. I guess that is > > fine. > > > > > I decided to not care about reference counts for userland processes > > > because I want to be sure the process itself will be charged for > > > cleaning its VM space. This can be seen as a huge munmap(2) of the > > > whole address space. > > > > I agree with that. We want the process to do the work in exit1 and not > > have some other thread do this on behalf. > > > > > The reference protects the 'struct vmspace' which needs to stay around > > > for the reasons mentioned above. > > > > I am more worried that some code in this exit1, reaper, process_zap jungle > > would change a value inside struct vmspace that would break the assumption > > you made about the map lock. I had a look and did not spot anything right > > away. > > > > > For that reasons updated diff below skips uvmspace_purge() in exit1() for > > > kernel threads. > > > > This is fine with me. > > > > > Done. I also moved cleanups needed for VMMAP_DEBUG builds where they belong. > > > > Some bikeshed requests below. Apart from that OK claudio@ > > I will throw it onto some systems to test. > > amd64 did a make build and make regress and all is ok. > sparc64 make build still running but this looks good as well. > arm64, hit some problems on my first try, 2nd make build is now started. > Lets see. I am not able to make build on my arm64 with the uvm_purge diff. I think there is some bad interaction with what kettenis did in pmap. A few minutes in the build the system locks up hard. So not sure how to progress here. I should probably try kettenis's diff on top of this and see. -- :wq Claudio