From: Mark Kettenis Subject: Re: Newest uvm_purge() to try To: Claudio Jeker Cc: tech@openbsd.org Date: Mon, 02 Jun 2025 14:40:27 +0200 > Date: Mon, 2 Jun 2025 13:45:16 +0200 > From: Claudio Jeker > > On Mon, Jun 02, 2025 at 11:32:08AM +0200, Mark Kettenis wrote: > > > Date: Mon, 2 Jun 2025 11:13:52 +0200 > > > From: Claudio Jeker > > > > > > On Mon, Jun 02, 2025 at 10:55:33AM +0200, Martin Pieuchot wrote: > > > > On 28/05/25(Wed) 22:24, Mark Kettenis wrote: > > > > > > Date: Sat, 24 May 2025 13:35:16 +0200 > > > > > > From: Mark Kettenis > > > > > > > > > > > > > Date: Sat, 24 May 2025 12:52:42 +0200 > > > > > > > From: Martin Pieuchot > > > > > > > > > > > > > > On 23/05/25(Fri) 15:10, Mark Kettenis wrote: > > > > > > > > > Date: Fri, 23 May 2025 13:15:01 +0200 > > > > > > > > > From: Claudio Jeker > > > > > > > > > > > > > > > > > > On Fri, May 23, 2025 at 10:03:11AM +0200, Martin Pieuchot wrote: > > > > > > > > > > On 22/05/25(Thu) 14:45, Mark Kettenis wrote: > > > > > > > > > > > > Date: Thu, 22 May 2025 11:58:38 +0200 > > > > > > > > > > > > From: Martin Pieuchot > > > > > > > > > > > > > > > > > > > > > > > > Here's the latest version incorporating kettenis@'s pmap_purge() and > > > > > > > > > > > > claudio@'s feedbacks. I observed a performance improvement of 10-15% > > > > > > > > > > > > on workloads using 24 to 48 CPUs. %sys time obviously increases now > > > > > > > > > > > > that tearing down the VM space is accounted for. > > > > > > > > > > > > > > > > > > > > > > > > Please test and report back. > > > > > > > > > > > > > > > > > > > > > > Going to play with this! > > > > > > > > > > > > > > > > > > > > > > I have a question I wanted to ask about the diff though... > > > > > > > > > > > > > > > > > > > > > > > Index: kern/kern_exit.c > > > > > > > > > > > > =================================================================== > > > > > > > > > > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v > > > > > > > > > > > > diff -u -p -r1.248 kern_exit.c > > > > > > > > > > > > --- kern/kern_exit.c 21 May 2025 09:42:59 -0000 1.248 > > > > > > > > > > > > +++ kern/kern_exit.c 22 May 2025 09:30:43 -0000 > > > > > > > > > > > > @@ -242,9 +242,14 @@ exit1(struct proc *p, int xexit, int xsi > > > > > > > > > > > > if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT) > > > > > > > > > > > > atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE); > > > > > > > > > > > > > > > > > > > > > > > > -#ifdef __HAVE_PMAP_PURGE > > > > > > > > > > > > - pmap_purge(p); > > > > > > > > > > > > + /* Teardown the virtual address space. */ > > > > > > > > > > > > + if ((p->p_flag & P_SYSTEM) == 0) { > > > > > > > > > > > > +#ifdef MULTIPROCESSOR > > > > > > > > > > > > + __mp_release_all(&kernel_lock); > > > > > > > > > > > > > > > > > > > > > > Why do we need an __mp_release_all() here? Is that because we have > > > > > > > > > > > multiple paths into exit1() that have different recurse counts for the > > > > > > > > > > > kernel lock? > > > > > > > > > > > > > > > > > > > > Exactly. At least the one that come from single suspend code which > > > > > > > > > > might be > 1. > > > > > > > > > > > > > > > > > > Right single_thread_check_locked() calls exit1() and calls KERNEL_LOCK() > > > > > > > > > before since it may run without KERNEL_LOCK(). > > > > > > > > > > > > > > > > Hopefully we can fix that some day. But for now, maybe this needs a > > > > > > > > comment saying that we might be called recursively locked and that > > > > > > > > releasing all locks here is ok since exit1() won't return? > > > > > > > > > > > > > > Sure, updated diff below does that. > > > > > > > > > > > > Excellent! Once this has gone through sufficient testing, I think > > > > > > this should go in. > > > > > > > > > > > > > > > > powerpc64 seems to be happy; did some kernel, libc and clang builds > > > > > without issues. > > > > > > > > So far this version has been tested on amd64, arm64, riscv64 and powerpc64. > > > > > > > > Any more tests? Any oks? > > > > > > I ran with this on sparc64 for a long while. > > > > > > OK claudio@ > > > > > > On last question, before we called pmap_purge(p) all the time and now it > > > is no longer called for P_SYSTEM procs (aka kthreads). > > > Does this fix a bug we currently have or did pmap_purge on a shared > > > vmspace not cause any harm? > > > > I guess we the > > > > if ((p->p_flag & P_THREAD) == 0) > > > > check in the current code makes sure pmap_purge() doesn't get called > > for kthreads? And the main kernel proc never exits... > > kthreads are full processes with a single thread but shared kernel > vmspace. So P_THREAD is not set but the vmspace has a refcount > 1 and > P_SYSTEM should also be set. > > We don't exit kthreads very often but it does happen (e.g. nfsiod and some > other ones). > > We really should switch kthreads to proper threads but that is one of > those todo items that look easy but aren't. Well, in that case the diff fixes a bug. I think pmap_purge() will spin if it gets called for an exiting kthread. > > > > Index: kern/kern_exit.c > > > > =================================================================== > > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v > > > > diff -u -p -r1.249 kern_exit.c > > > > --- kern/kern_exit.c 24 May 2025 06:49:16 -0000 1.249 > > > > +++ kern/kern_exit.c 24 May 2025 10:52:03 -0000 > > > > @@ -243,9 +243,22 @@ exit1(struct proc *p, int xexit, int xsi > > > > if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT) > > > > atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE); > > > > > > > > -#ifdef __HAVE_PMAP_PURGE > > > > - pmap_purge(p); > > > > + /* Teardown the virtual address space. */ > > > > + if ((p->p_flag & P_SYSTEM) == 0) { > > > > + /* > > > > + * exit1() might be called with a lock count > > > > + * greater than one and we want to ensure the > > > > + * costly operation of tearing down the VM space > > > > + * is performed unlocked. > > > > + * It is safe to release them all since exit1() > > > > + * will not return. > > > > + */ > > > > +#ifdef MULTIPROCESSOR > > > > + __mp_release_all(&kernel_lock); > > > > #endif > > > > + uvm_purge(); > > > > + KERNEL_LOCK(); > > > > + } > > > > } > > > > > > > > p->p_fd = NULL; /* zap the thread's copy */ > > > > Index: uvm/uvm_extern.h > > > > =================================================================== > > > > RCS file: /cvs/src/sys/uvm/uvm_extern.h,v > > > > diff -u -p -r1.183 uvm_extern.h > > > > --- uvm/uvm_extern.h 21 May 2025 16:59:32 -0000 1.183 > > > > +++ uvm/uvm_extern.h 24 May 2025 10:49:48 -0000 > > > > @@ -268,6 +268,7 @@ int uvm_fault(vm_map_t, vaddr_t, vm_fa > > > > > > > > vaddr_t uvm_uarea_alloc(void); > > > > void uvm_uarea_free(struct proc *); > > > > +void uvm_purge(void); > > > > void uvm_exit(struct process *); > > > > void uvm_init_limits(struct plimit *); > > > > boolean_t uvm_kernacc(caddr_t, size_t, int); > > > > @@ -401,6 +402,7 @@ void uvmspace_init(struct vmspace *, s > > > > void uvmspace_exec(struct proc *, vaddr_t, vaddr_t); > > > > struct vmspace *uvmspace_fork(struct process *); > > > > void uvmspace_addref(struct vmspace *); > > > > +void uvmspace_purge(struct vmspace *); > > > > void uvmspace_free(struct vmspace *); > > > > struct vmspace *uvmspace_share(struct process *); > > > > int uvm_sysctl(int *, u_int, void *, size_t *, > > > > Index: uvm/uvm_glue.c > > > > =================================================================== > > > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v > > > > diff -u -p -r1.90 uvm_glue.c > > > > --- uvm/uvm_glue.c 20 May 2025 07:02:20 -0000 1.90 > > > > +++ uvm/uvm_glue.c 24 May 2025 10:49:48 -0000 > > > > @@ -286,6 +286,25 @@ uvm_uarea_free(struct proc *p) > > > > } > > > > > > > > /* > > > > + * uvm_purge: teardown a virtual address space. > > > > + * > > > > + * If multi-threaded, must be called by the last thread of a process. > > > > + */ > > > > +void > > > > +uvm_purge(void) > > > > +{ > > > > + struct proc *p = curproc; > > > > + struct vmspace *vm = p->p_vmspace; > > > > + > > > > + KERNEL_ASSERT_UNLOCKED(); > > > > + > > > > +#ifdef __HAVE_PMAP_PURGE > > > > + pmap_purge(p); > > > > +#endif > > > > + uvmspace_purge(vm); > > > > +} > > > > + > > > > +/* > > > > * uvm_exit: exit a virtual address space > > > > */ > > > > void > > > > Index: uvm/uvm_map.c > > > > =================================================================== > > > > RCS file: /cvs/src/sys/uvm/uvm_map.c,v > > > > diff -u -p -r1.344 uvm_map.c > > > > --- uvm/uvm_map.c 21 May 2025 16:59:32 -0000 1.344 > > > > +++ uvm/uvm_map.c 24 May 2025 10:49:48 -0000 > > > > @@ -2491,6 +2491,24 @@ uvm_map_teardown(struct vm_map *map) > > > > entry = TAILQ_NEXT(entry, dfree.deadq); > > > > } > > > > > > > > +#ifdef VMMAP_DEBUG > > > > + numt = numq = 0; > > > > + RBT_FOREACH(entry, uvm_map_addr, &map->addr) > > > > + numt++; > > > > + TAILQ_FOREACH(entry, &dead_entries, dfree.deadq) > > > > + numq++; > > > > + KASSERT(numt == numq); > > > > + /* > > > > + * Let VMMAP_DEBUG checks work on an empty map if uvm_map_teardown() > > > > + * is called twice. > > > > + */ > > > > + map->size = 0; > > > > + map->min_offset = 0; > > > > + map->max_offset = 0; > > > > + map->flags &= ~VM_MAP_ISVMSPACE; > > > > +#endif > > > > + /* reinit RB tree since the above removal leaves the tree corrupted. */ > > > > + RBT_INIT(uvm_map_addr, &map->addr); > > > > vm_map_unlock(map); > > > > > > > > /* Remove address selectors. */ > > > > @@ -2503,18 +2521,7 @@ uvm_map_teardown(struct vm_map *map) > > > > uvm_addr_destroy(map->uaddr_brk_stack); > > > > map->uaddr_brk_stack = NULL; > > > > > > > > -#ifdef VMMAP_DEBUG > > > > - numt = numq = 0; > > > > - RBT_FOREACH(entry, uvm_map_addr, &map->addr) > > > > - numt++; > > > > - TAILQ_FOREACH(entry, &dead_entries, dfree.deadq) > > > > - numq++; > > > > - KASSERT(numt == numq); > > > > -#endif > > > > uvm_unmap_detach(&dead_entries, 0); > > > > - > > > > - pmap_destroy(map->pmap); > > > > - map->pmap = NULL; > > > > } > > > > > > > > /* > > > > @@ -3395,6 +3402,24 @@ uvmspace_addref(struct vmspace *vm) > > > > atomic_inc_int(&vm->vm_refcnt); > > > > } > > > > > > > > +void > > > > +uvmspace_purge(struct vmspace *vm) > > > > +{ > > > > +#ifdef SYSVSHM > > > > + /* Get rid of any SYSV shared memory segments. */ > > > > + if (vm->vm_shm != NULL) { > > > > + KERNEL_LOCK(); > > > > + shmexit(vm); > > > > + KERNEL_UNLOCK(); > > > > + } > > > > +#endif > > > > + /* > > > > + * Lock the map, to wait out all other references to it. delete > > > > + * all of the mappings and pages they hold. > > > > + */ > > > > + uvm_map_teardown(&vm->vm_map); > > > > +} > > > > + > > > > /* > > > > * uvmspace_free: free a vmspace data structure > > > > */ > > > > @@ -3403,20 +3428,15 @@ uvmspace_free(struct vmspace *vm) > > > > { > > > > if (atomic_dec_int_nv(&vm->vm_refcnt) == 0) { > > > > /* > > > > - * lock the map, to wait out all other references to it. delete > > > > - * all of the mappings and pages they hold, then call the pmap > > > > - * module to reclaim anything left. > > > > + * Sanity check. Kernel threads never end up here and > > > > + * userland ones already tear down there VM space in > > > > + * exit1(). > > > > */ > > > > -#ifdef SYSVSHM > > > > - /* Get rid of any SYSV shared memory segments. */ > > > > - if (vm->vm_shm != NULL) { > > > > - KERNEL_LOCK(); > > > > - shmexit(vm); > > > > - KERNEL_UNLOCK(); > > > > - } > > > > -#endif > > > > + uvmspace_purge(vm); > > > > + > > > > + pmap_destroy(vm->vm_map.pmap); > > > > + vm->vm_map.pmap = NULL; > > > > > > > > - uvm_map_teardown(&vm->vm_map); > > > > pool_put(&uvm_vmspace_pool, vm); > > > > } > > > > } > > > > > > > > > > > > > > -- > > > :wq Claudio > > > > > -- > :wq Claudio >