From: Claudio Jeker Subject: Re: Newest uvm_purge() to try To: Mark Kettenis Cc: tech@openbsd.org Date: Mon, 2 Jun 2025 13:45:16 +0200 On Mon, Jun 02, 2025 at 11:32:08AM +0200, Mark Kettenis wrote: > > Date: Mon, 2 Jun 2025 11:13:52 +0200 > > From: Claudio Jeker > > > > On Mon, Jun 02, 2025 at 10:55:33AM +0200, Martin Pieuchot wrote: > > > On 28/05/25(Wed) 22:24, Mark Kettenis wrote: > > > > > Date: Sat, 24 May 2025 13:35:16 +0200 > > > > > From: Mark Kettenis > > > > > > > > > > > Date: Sat, 24 May 2025 12:52:42 +0200 > > > > > > From: Martin Pieuchot > > > > > > > > > > > > On 23/05/25(Fri) 15:10, Mark Kettenis wrote: > > > > > > > > Date: Fri, 23 May 2025 13:15:01 +0200 > > > > > > > > From: Claudio Jeker > > > > > > > > > > > > > > > > On Fri, May 23, 2025 at 10:03:11AM +0200, Martin Pieuchot wrote: > > > > > > > > > On 22/05/25(Thu) 14:45, Mark Kettenis wrote: > > > > > > > > > > > Date: Thu, 22 May 2025 11:58:38 +0200 > > > > > > > > > > > From: Martin Pieuchot > > > > > > > > > > > > > > > > > > > > > > Here's the latest version incorporating kettenis@'s pmap_purge() and > > > > > > > > > > > claudio@'s feedbacks. I observed a performance improvement of 10-15% > > > > > > > > > > > on workloads using 24 to 48 CPUs. %sys time obviously increases now > > > > > > > > > > > that tearing down the VM space is accounted for. > > > > > > > > > > > > > > > > > > > > > > Please test and report back. > > > > > > > > > > > > > > > > > > > > Going to play with this! > > > > > > > > > > > > > > > > > > > > I have a question I wanted to ask about the diff though... > > > > > > > > > > > > > > > > > > > > > Index: kern/kern_exit.c > > > > > > > > > > > =================================================================== > > > > > > > > > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v > > > > > > > > > > > diff -u -p -r1.248 kern_exit.c > > > > > > > > > > > --- kern/kern_exit.c 21 May 2025 09:42:59 -0000 1.248 > > > > > > > > > > > +++ kern/kern_exit.c 22 May 2025 09:30:43 -0000 > > > > > > > > > > > @@ -242,9 +242,14 @@ exit1(struct proc *p, int xexit, int xsi > > > > > > > > > > > if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT) > > > > > > > > > > > atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE); > > > > > > > > > > > > > > > > > > > > > > -#ifdef __HAVE_PMAP_PURGE > > > > > > > > > > > - pmap_purge(p); > > > > > > > > > > > + /* Teardown the virtual address space. */ > > > > > > > > > > > + if ((p->p_flag & P_SYSTEM) == 0) { > > > > > > > > > > > +#ifdef MULTIPROCESSOR > > > > > > > > > > > + __mp_release_all(&kernel_lock); > > > > > > > > > > > > > > > > > > > > Why do we need an __mp_release_all() here? Is that because we have > > > > > > > > > > multiple paths into exit1() that have different recurse counts for the > > > > > > > > > > kernel lock? > > > > > > > > > > > > > > > > > > Exactly. At least the one that come from single suspend code which > > > > > > > > > might be > 1. > > > > > > > > > > > > > > > > Right single_thread_check_locked() calls exit1() and calls KERNEL_LOCK() > > > > > > > > before since it may run without KERNEL_LOCK(). > > > > > > > > > > > > > > Hopefully we can fix that some day. But for now, maybe this needs a > > > > > > > comment saying that we might be called recursively locked and that > > > > > > > releasing all locks here is ok since exit1() won't return? > > > > > > > > > > > > Sure, updated diff below does that. > > > > > > > > > > Excellent! Once this has gone through sufficient testing, I think > > > > > this should go in. > > > > > > > > > > > > > powerpc64 seems to be happy; did some kernel, libc and clang builds > > > > without issues. > > > > > > So far this version has been tested on amd64, arm64, riscv64 and powerpc64. > > > > > > Any more tests? Any oks? > > > > I ran with this on sparc64 for a long while. > > > > OK claudio@ > > > > On last question, before we called pmap_purge(p) all the time and now it > > is no longer called for P_SYSTEM procs (aka kthreads). > > Does this fix a bug we currently have or did pmap_purge on a shared > > vmspace not cause any harm? > > I guess we the > > if ((p->p_flag & P_THREAD) == 0) > > check in the current code makes sure pmap_purge() doesn't get called > for kthreads? And the main kernel proc never exits... kthreads are full processes with a single thread but shared kernel vmspace. So P_THREAD is not set but the vmspace has a refcount > 1 and P_SYSTEM should also be set. We don't exit kthreads very often but it does happen (e.g. nfsiod and some other ones). We really should switch kthreads to proper threads but that is one of those todo items that look easy but aren't. > > > Index: kern/kern_exit.c > > > =================================================================== > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v > > > diff -u -p -r1.249 kern_exit.c > > > --- kern/kern_exit.c 24 May 2025 06:49:16 -0000 1.249 > > > +++ kern/kern_exit.c 24 May 2025 10:52:03 -0000 > > > @@ -243,9 +243,22 @@ exit1(struct proc *p, int xexit, int xsi > > > if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT) > > > atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE); > > > > > > -#ifdef __HAVE_PMAP_PURGE > > > - pmap_purge(p); > > > + /* Teardown the virtual address space. */ > > > + if ((p->p_flag & P_SYSTEM) == 0) { > > > + /* > > > + * exit1() might be called with a lock count > > > + * greater than one and we want to ensure the > > > + * costly operation of tearing down the VM space > > > + * is performed unlocked. > > > + * It is safe to release them all since exit1() > > > + * will not return. > > > + */ > > > +#ifdef MULTIPROCESSOR > > > + __mp_release_all(&kernel_lock); > > > #endif > > > + uvm_purge(); > > > + KERNEL_LOCK(); > > > + } > > > } > > > > > > p->p_fd = NULL; /* zap the thread's copy */ > > > Index: uvm/uvm_extern.h > > > =================================================================== > > > RCS file: /cvs/src/sys/uvm/uvm_extern.h,v > > > diff -u -p -r1.183 uvm_extern.h > > > --- uvm/uvm_extern.h 21 May 2025 16:59:32 -0000 1.183 > > > +++ uvm/uvm_extern.h 24 May 2025 10:49:48 -0000 > > > @@ -268,6 +268,7 @@ int uvm_fault(vm_map_t, vaddr_t, vm_fa > > > > > > vaddr_t uvm_uarea_alloc(void); > > > void uvm_uarea_free(struct proc *); > > > +void uvm_purge(void); > > > void uvm_exit(struct process *); > > > void uvm_init_limits(struct plimit *); > > > boolean_t uvm_kernacc(caddr_t, size_t, int); > > > @@ -401,6 +402,7 @@ void uvmspace_init(struct vmspace *, s > > > void uvmspace_exec(struct proc *, vaddr_t, vaddr_t); > > > struct vmspace *uvmspace_fork(struct process *); > > > void uvmspace_addref(struct vmspace *); > > > +void uvmspace_purge(struct vmspace *); > > > void uvmspace_free(struct vmspace *); > > > struct vmspace *uvmspace_share(struct process *); > > > int uvm_sysctl(int *, u_int, void *, size_t *, > > > Index: uvm/uvm_glue.c > > > =================================================================== > > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v > > > diff -u -p -r1.90 uvm_glue.c > > > --- uvm/uvm_glue.c 20 May 2025 07:02:20 -0000 1.90 > > > +++ uvm/uvm_glue.c 24 May 2025 10:49:48 -0000 > > > @@ -286,6 +286,25 @@ uvm_uarea_free(struct proc *p) > > > } > > > > > > /* > > > + * uvm_purge: teardown a virtual address space. > > > + * > > > + * If multi-threaded, must be called by the last thread of a process. > > > + */ > > > +void > > > +uvm_purge(void) > > > +{ > > > + struct proc *p = curproc; > > > + struct vmspace *vm = p->p_vmspace; > > > + > > > + KERNEL_ASSERT_UNLOCKED(); > > > + > > > +#ifdef __HAVE_PMAP_PURGE > > > + pmap_purge(p); > > > +#endif > > > + uvmspace_purge(vm); > > > +} > > > + > > > +/* > > > * uvm_exit: exit a virtual address space > > > */ > > > void > > > Index: uvm/uvm_map.c > > > =================================================================== > > > RCS file: /cvs/src/sys/uvm/uvm_map.c,v > > > diff -u -p -r1.344 uvm_map.c > > > --- uvm/uvm_map.c 21 May 2025 16:59:32 -0000 1.344 > > > +++ uvm/uvm_map.c 24 May 2025 10:49:48 -0000 > > > @@ -2491,6 +2491,24 @@ uvm_map_teardown(struct vm_map *map) > > > entry = TAILQ_NEXT(entry, dfree.deadq); > > > } > > > > > > +#ifdef VMMAP_DEBUG > > > + numt = numq = 0; > > > + RBT_FOREACH(entry, uvm_map_addr, &map->addr) > > > + numt++; > > > + TAILQ_FOREACH(entry, &dead_entries, dfree.deadq) > > > + numq++; > > > + KASSERT(numt == numq); > > > + /* > > > + * Let VMMAP_DEBUG checks work on an empty map if uvm_map_teardown() > > > + * is called twice. > > > + */ > > > + map->size = 0; > > > + map->min_offset = 0; > > > + map->max_offset = 0; > > > + map->flags &= ~VM_MAP_ISVMSPACE; > > > +#endif > > > + /* reinit RB tree since the above removal leaves the tree corrupted. */ > > > + RBT_INIT(uvm_map_addr, &map->addr); > > > vm_map_unlock(map); > > > > > > /* Remove address selectors. */ > > > @@ -2503,18 +2521,7 @@ uvm_map_teardown(struct vm_map *map) > > > uvm_addr_destroy(map->uaddr_brk_stack); > > > map->uaddr_brk_stack = NULL; > > > > > > -#ifdef VMMAP_DEBUG > > > - numt = numq = 0; > > > - RBT_FOREACH(entry, uvm_map_addr, &map->addr) > > > - numt++; > > > - TAILQ_FOREACH(entry, &dead_entries, dfree.deadq) > > > - numq++; > > > - KASSERT(numt == numq); > > > -#endif > > > uvm_unmap_detach(&dead_entries, 0); > > > - > > > - pmap_destroy(map->pmap); > > > - map->pmap = NULL; > > > } > > > > > > /* > > > @@ -3395,6 +3402,24 @@ uvmspace_addref(struct vmspace *vm) > > > atomic_inc_int(&vm->vm_refcnt); > > > } > > > > > > +void > > > +uvmspace_purge(struct vmspace *vm) > > > +{ > > > +#ifdef SYSVSHM > > > + /* Get rid of any SYSV shared memory segments. */ > > > + if (vm->vm_shm != NULL) { > > > + KERNEL_LOCK(); > > > + shmexit(vm); > > > + KERNEL_UNLOCK(); > > > + } > > > +#endif > > > + /* > > > + * Lock the map, to wait out all other references to it. delete > > > + * all of the mappings and pages they hold. > > > + */ > > > + uvm_map_teardown(&vm->vm_map); > > > +} > > > + > > > /* > > > * uvmspace_free: free a vmspace data structure > > > */ > > > @@ -3403,20 +3428,15 @@ uvmspace_free(struct vmspace *vm) > > > { > > > if (atomic_dec_int_nv(&vm->vm_refcnt) == 0) { > > > /* > > > - * lock the map, to wait out all other references to it. delete > > > - * all of the mappings and pages they hold, then call the pmap > > > - * module to reclaim anything left. > > > + * Sanity check. Kernel threads never end up here and > > > + * userland ones already tear down there VM space in > > > + * exit1(). > > > */ > > > -#ifdef SYSVSHM > > > - /* Get rid of any SYSV shared memory segments. */ > > > - if (vm->vm_shm != NULL) { > > > - KERNEL_LOCK(); > > > - shmexit(vm); > > > - KERNEL_UNLOCK(); > > > - } > > > -#endif > > > + uvmspace_purge(vm); > > > + > > > + pmap_destroy(vm->vm_map.pmap); > > > + vm->vm_map.pmap = NULL; > > > > > > - uvm_map_teardown(&vm->vm_map); > > > pool_put(&uvm_vmspace_pool, vm); > > > } > > > } > > > > > > > > > > -- > > :wq Claudio > > -- :wq Claudio