From: Dave Voutila Subject: Re: Faster _exit(2) for a faster userland: R.I.P the reaper To: Mike Larkin Cc: Mark Kettenis , tedu@tedunangst.com, dlg@openbsd.org, visa@openbsd.org, tech@openbsd.org, jan@openbsd.org, ajacoutot@openbsd.org Date: Tue, 06 May 2025 13:01:08 -0400 Mike Larkin writes: > On Tue, May 06, 2025 at 10:46:01AM +0200, Claudio Jeker wrote: >> On Mon, May 05, 2025 at 10:46:38PM +0200, Mark Kettenis wrote: >> > > Date: Mon, 5 May 2025 17:25:02 +0200 >> > > From: Claudio Jeker >> > > >> > > On Sun, May 04, 2025 at 08:38:06AM -0600, Theo de Raadt wrote: >> > > > Martin Pieuchot wrote: >> > > > >> > > > > > I do not understand any of the diff. It seems to be trying to return a >> > > > > > child to the parent, before resources are cleared. >> > > > > >> > > > > It doesn't. Order of operations didn't change. The KERNEL_LOCK() is >> > > > > there and asserted. >> > > > >> > > > I did not mention the KERNEL_LOCK(). >> > > > >> > > > > I disagree. I need to remove the reaper. That *is* the problem. >> > > > >> > > > You are using proc0 in an very dangerous way. >> > > > >> > > > > > I've called this uvmspace_purge(). It would be a bit like >> > > > > > >> > > > > > uvm_unmap_remove(p->p_vmspace, 0, VM_MAXUSER_ADDRESS, >> > > > > > &dead_entries, FALSE, TRUE, TRUE) >> > > > > > >> > > > > > Except there are clearly optimizations. The function can sleep. It is >> > > > > > called quite early in process context. >> > > > > > [...] >> > > > > >> > > > > There's no need to add a new function. Calling uvm_exit() from exit1() >> > > > > is enough and work but that's not buying us much. >> > > > >> > > > As proc0? Disagree strongly on this direction. The process should free >> > > > it's own pages out of it's own map. It can do that if uvm_exit() is split. >> > > > >> > > > But you've made up your mind to ignore everything everyone says. You don't >> > > > even reply to emails about the proposals. Your public requests for feedback >> > > > come off as fake. >> > > >> > > Here is a minimal diff implementing uvm_purge() in the dumbest way >> > > possible. It removes the reaper from the hot path without any scary hacks. >> > > >> > > It only fixes uvm_map_teardown() so it can be called again. >> > >> > But we shouldn't have to call it again no? >> >> There is a complication that has to do with VM space sharing (which is >> only used by proc0). So it is easier for now to offload proc0 handling to >> uvm_exit() (which does the refcnt update and check if it hit 0). > > Does vmm(4)/vmd(8)'s use of uvm_share matter here? dv@ has done work to remove > a lot of that nonsense but I just want to point it out in case we are considering > going this way... > I need to update my diff (hopefully today) but the one I'd been working on removes uvm_share entirely. IIRC uvm_share was sharing maps or map entries and not the vmspace object as a whole. Not sure if that matters here. >> >> I think we can kill VM space sharing and use real threads for kthread. >> This would remove the vm space refcnt and make code a fair bit simpler. >> guenther@ wanted that as well so maybe I look into that. >> >> > > I tried to zap the pmap first (with pmap_remove of the full user address >> > > space) but that is very slow. >> > >> > Our page tables are typically only populated sparsely. So this >> > doesn't surprise me. >> >> Yes, it was a try, I was hoping that pmap_remove would be walk the sparse >> pmap tables efficently. It does not. >> >> > > I think pmap can be optimized a lot here but we don't have an api >> > > for that yet. >> > >> > What optimizations are you thinking off? The only "easy" optimization >> > I found in the teardown was replacing the TLB flushes with a single >> > ASID-based TLB flush on arm64. >> >> So there are a few things I noticed. uvm_map_teardown() spends about the >> same amount of time in uvm as in pmap. It calls pmap many many times so >> there is a lot of call overhead. >> >> The moment uvm purges the userland mappings a split KU arch (arm64, >> sparc64, ...) can nuke the pmap, flush out the tlbs via ASID and then >> instruct uvm_map_teardown() to skip any pmap calls. >> >> For systems witk K + U in pmap (i386, amd64) the situation is more complex. >> In that case we can kill all U mappings, even do the ASID flush but we >> need to keep the K mappings around and with that enough of pmap so that >> cpu_switchto() is able to jump off the thread. >> Still doing this will still allow uvm_map_teardown() to skip most pmap >> calls. >> >> As you said the pmap tables are populated very sparesly and uvm removes >> these blocks in a inefficent way. >> >> > Speaking about that optimization, I think your diff breaks it. The >> > optimization relies on calling pmap_deactivate() in cpu_exit() before >> > teardown starts. But now that you do the teardown in exit1() it >> > happens before that call is made. So that'll have to be fixed. >> >> Agreed. >> >> > > Index: kern/kern_exit.c >> > > =================================================================== >> > > RCS file: /cvs/src/sys/kern/kern_exit.c,v >> > > diff -u -p -r1.245 kern_exit.c >> > > --- kern/kern_exit.c 2 May 2025 05:04:38 -0000 1.245 >> > > +++ kern/kern_exit.c 5 May 2025 15:18:17 -0000 >> > > @@ -235,6 +235,10 @@ exit1(struct proc *p, int xexit, int xsi >> > > free(pr->ps_libcpin.pn_pins, M_PINSYSCALL, >> > > pr->ps_libcpin.pn_npins * sizeof(u_int)); >> > > >> > > + KERNEL_UNLOCK(); >> > > + uvm_purge(pr); >> > > + KERNEL_LOCK(); >> > > + >> > > /* >> > > * If parent has the SAS_NOCLDWAIT flag set, we're not >> > > * going to become a zombie. >> > > Index: uvm/uvm_extern.h >> > > =================================================================== >> > > RCS file: /cvs/src/sys/uvm/uvm_extern.h,v >> > > diff -u -p -r1.180 uvm_extern.h >> > > --- uvm/uvm_extern.h 19 Nov 2024 06:18:26 -0000 1.180 >> > > +++ uvm/uvm_extern.h 5 May 2025 15:18:17 -0000 >> > > @@ -269,6 +269,7 @@ int uvm_fault(vm_map_t, vaddr_t, vm_fa >> > > vaddr_t uvm_uarea_alloc(void); >> > > void uvm_uarea_free(struct proc *); >> > > void uvm_exit(struct process *); >> > > +void uvm_purge(struct process *); >> > > void uvm_init_limits(struct plimit *); >> > > boolean_t uvm_kernacc(caddr_t, size_t, int); >> > > >> > > Index: uvm/uvm_glue.c >> > > =================================================================== >> > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v >> > > diff -u -p -r1.88 uvm_glue.c >> > > --- uvm/uvm_glue.c 21 Mar 2025 13:19:33 -0000 1.88 >> > > +++ uvm/uvm_glue.c 5 May 2025 15:18:17 -0000 >> > > @@ -297,6 +297,29 @@ uvm_exit(struct process *pr) >> > > uvmspace_free(vm); >> > > } >> > > >> > > +void >> > > +uvm_purge(struct process *pr) >> > > +{ >> > > + struct vmspace *vm = pr->ps_vmspace; >> > > + >> > > + /* only proc0 uses shared vm space */ >> > > + if (atomic_load_int(&vm->vm_refcnt) != 1) >> > > + return; >> > > + >> > > +#ifdef SYSVSHM >> > > + /* Get rid of any SYSV shared memory segments. */ >> > > + if (vm->vm_shm != NULL) { >> > > + KERNEL_LOCK(); >> > > + shmexit(vm); >> > > + KERNEL_UNLOCK(); >> > > + } >> > > +#endif >> > > + >> > > +// pmap_remove(vm->vm_map.pmap, 0, VM_MAXUSER_ADDRESS); >> > > + >> > > + uvm_map_teardown(&vm->vm_map); >> > > +} >> > > + >> > > /* >> > > * uvm_init_limit: init per-process VM limits >> > > * >> > > Index: uvm/uvm_map.c >> > > =================================================================== >> > > RCS file: /cvs/src/sys/uvm/uvm_map.c,v >> > > diff -u -p -r1.341 uvm_map.c >> > > --- uvm/uvm_map.c 21 Apr 2025 14:46:18 -0000 1.341 >> > > +++ uvm/uvm_map.c 5 May 2025 15:18:17 -0000 >> > > @@ -138,7 +138,6 @@ int uvm_map_pageable_wire(struct vm_m >> > > vaddr_t, vaddr_t, int); >> > > void uvm_map_setup_entries(struct vm_map*); >> > > void uvm_map_setup_md(struct vm_map*); >> > > -void uvm_map_teardown(struct vm_map*); >> > > void uvm_map_vmspace_update(struct vm_map*, >> > > struct uvm_map_deadq*, int); >> > > void uvm_map_kmem_grow(struct vm_map*, >> > > @@ -2490,6 +2489,7 @@ uvm_map_teardown(struct vm_map *map) >> > > /* Update wave-front. */ >> > > entry = TAILQ_NEXT(entry, dfree.deadq); >> > > } >> > > + RBT_INIT(uvm_map_addr, &map->addr); >> > > >> > > vm_map_unlock(map); >> > > >> > > @@ -2512,9 +2512,6 @@ uvm_map_teardown(struct vm_map *map) >> > > KASSERT(numt == numq); >> > > #endif >> > > uvm_unmap_detach(&dead_entries, 0); >> > > - >> > > - pmap_destroy(map->pmap); >> > > - map->pmap = NULL; >> > > } >> > > >> > > /* >> > > @@ -3417,6 +3414,9 @@ uvmspace_free(struct vmspace *vm) >> > > #endif >> > > >> > > uvm_map_teardown(&vm->vm_map); >> > > + >> > > + pmap_destroy(vm->vm_map.pmap); >> > > + >> > > pool_put(&uvm_vmspace_pool, vm); >> > > } >> > > } >> > > Index: uvm/uvm_map.h >> > > =================================================================== >> > > RCS file: /cvs/src/sys/uvm/uvm_map.h,v >> > > diff -u -p -r1.94 uvm_map.h >> > > --- uvm/uvm_map.h 15 Nov 2024 02:59:23 -0000 1.94 >> > > +++ uvm/uvm_map.h 5 May 2025 15:18:17 -0000 >> > > @@ -359,6 +359,7 @@ void uvm_unmap(struct vm_map *, vaddr_t >> > > void uvm_unmap_detach(struct uvm_map_deadq *, int); >> > > int uvm_unmap_remove(struct vm_map*, vaddr_t, vaddr_t, >> > > struct uvm_map_deadq *, boolean_t, boolean_t, boolean_t); >> > > +void uvm_map_teardown(struct vm_map*); >> > > void uvm_map_set_uaddr(struct vm_map*, struct uvm_addr_state**, >> > > struct uvm_addr_state*); >> > > int uvm_map_mquery(struct vm_map*, vaddr_t*, vsize_t, voff_t, int); >> > > >> > > >> >> -- >> :wq Claudio >>