Index | Thread | Search

From:
Mark Kettenis <mark.kettenis@xs4all.nl>
Subject:
Re: Newest uvm_purge() to try
To:
Claudio Jeker <cjeker@diehard.n-r-g.com>
Cc:
tech@openbsd.org
Date:
Mon, 02 Jun 2025 14:40:27 +0200

Download raw body.

Thread
> Date: Mon, 2 Jun 2025 13:45:16 +0200
> From: Claudio Jeker <cjeker@diehard.n-r-g.com>
> 
> On Mon, Jun 02, 2025 at 11:32:08AM +0200, Mark Kettenis wrote:
> > > Date: Mon, 2 Jun 2025 11:13:52 +0200
> > > From: Claudio Jeker <cjeker@diehard.n-r-g.com>
> > > 
> > > On Mon, Jun 02, 2025 at 10:55:33AM +0200, Martin Pieuchot wrote:
> > > > On 28/05/25(Wed) 22:24, Mark Kettenis wrote:
> > > > > > Date: Sat, 24 May 2025 13:35:16 +0200
> > > > > > From: Mark Kettenis <mark.kettenis@xs4all.nl>
> > > > > > 
> > > > > > > Date: Sat, 24 May 2025 12:52:42 +0200
> > > > > > > From: Martin Pieuchot <mpi@grenadille.net>
> > > > > > > 
> > > > > > > On 23/05/25(Fri) 15:10, Mark Kettenis wrote:
> > > > > > > > > Date: Fri, 23 May 2025 13:15:01 +0200
> > > > > > > > > From: Claudio Jeker <cjeker@diehard.n-r-g.com>
> > > > > > > > > 
> > > > > > > > > On Fri, May 23, 2025 at 10:03:11AM +0200, Martin Pieuchot wrote:
> > > > > > > > > > On 22/05/25(Thu) 14:45, Mark Kettenis wrote:
> > > > > > > > > > > > Date: Thu, 22 May 2025 11:58:38 +0200
> > > > > > > > > > > > From: Martin Pieuchot <mpi@grenadille.net>
> > > > > > > > > > > > 
> > > > > > > > > > > > Here's the latest version incorporating kettenis@'s pmap_purge() and
> > > > > > > > > > > > claudio@'s feedbacks.  I observed a performance improvement of 10-15%
> > > > > > > > > > > > on workloads using 24 to 48 CPUs.  %sys time obviously increases now
> > > > > > > > > > > > that tearing down the VM space is accounted for.
> > > > > > > > > > > > 
> > > > > > > > > > > > Please test and report back.
> > > > > > > > > > > 
> > > > > > > > > > > Going to play with this!
> > > > > > > > > > > 
> > > > > > > > > > > I have a question I wanted to ask about the diff though...
> > > > > > > > > > > 
> > > > > > > > > > > > Index: kern/kern_exit.c
> > > > > > > > > > > > ===================================================================
> > > > > > > > > > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v
> > > > > > > > > > > > diff -u -p -r1.248 kern_exit.c
> > > > > > > > > > > > --- kern/kern_exit.c	21 May 2025 09:42:59 -0000	1.248
> > > > > > > > > > > > +++ kern/kern_exit.c	22 May 2025 09:30:43 -0000
> > > > > > > > > > > > @@ -242,9 +242,14 @@ exit1(struct proc *p, int xexit, int xsi
> > > > > > > > > > > >  		if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT)
> > > > > > > > > > > >  			atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE);
> > > > > > > > > > > >  
> > > > > > > > > > > > -#ifdef __HAVE_PMAP_PURGE
> > > > > > > > > > > > -		pmap_purge(p);
> > > > > > > > > > > > +		/* Teardown the virtual address space. */
> > > > > > > > > > > > +		if ((p->p_flag & P_SYSTEM) == 0) {
> > > > > > > > > > > > +#ifdef MULTIPROCESSOR
> > > > > > > > > > > > +			__mp_release_all(&kernel_lock);
> > > > > > > > > > > 
> > > > > > > > > > > Why do we need an __mp_release_all() here?  Is that because we have
> > > > > > > > > > > multiple paths into exit1() that have different recurse counts for the
> > > > > > > > > > > kernel lock?
> > > > > > > > > > 
> > > > > > > > > > Exactly.  At least the one that come from single suspend code which
> > > > > > > > > > might be > 1.
> > > > > > > > > 
> > > > > > > > > Right single_thread_check_locked() calls exit1() and calls KERNEL_LOCK()
> > > > > > > > > before since it may run without KERNEL_LOCK().
> > > > > > > > 
> > > > > > > > Hopefully we can fix that some day.  But for now, maybe this needs a
> > > > > > > > comment saying that we might be called recursively locked and that
> > > > > > > > releasing all locks here is ok since exit1() won't return?
> > > > > > > 
> > > > > > > Sure, updated diff below does that.
> > > > > > 
> > > > > > Excellent!  Once this has gone through sufficient testing, I think
> > > > > > this should go in.
> > > > > >
> > > > > 
> > > > > powerpc64 seems to be happy; did some kernel, libc and clang builds
> > > > > without issues.
> > > > 
> > > > So far this version has been tested on amd64, arm64, riscv64 and powerpc64.  
> > > > 
> > > > Any more tests?  Any oks?
> > > 
> > > I ran with this on sparc64 for a long while.
> > > 
> > > OK claudio@
> > >  
> > > On last question, before we called pmap_purge(p) all the time and now it
> > > is no longer called for P_SYSTEM procs (aka kthreads).
> > > Does this fix a bug we currently have or did pmap_purge on a shared
> > > vmspace not cause any harm?
> > 
> > I guess we the
> > 
> >         if ((p->p_flag & P_THREAD) == 0)
> > 
> > check in the current code makes sure pmap_purge() doesn't get called
> > for kthreads?  And the main kernel proc never exits...
> 
> kthreads are full processes with a single thread but shared kernel
> vmspace. So P_THREAD is not set but the vmspace has a refcount > 1 and
> P_SYSTEM should also be set.
> 
> We don't exit kthreads very often but it does happen (e.g. nfsiod and some
> other ones).
> 
> We really should switch kthreads to proper threads but that is one of
> those todo items that look easy but aren't.

Well, in that case the diff fixes a bug.  I think pmap_purge() will
spin if it gets called for an exiting kthread.

> > > > Index: kern/kern_exit.c
> > > > ===================================================================
> > > > RCS file: /cvs/src/sys/kern/kern_exit.c,v
> > > > diff -u -p -r1.249 kern_exit.c
> > > > --- kern/kern_exit.c	24 May 2025 06:49:16 -0000	1.249
> > > > +++ kern/kern_exit.c	24 May 2025 10:52:03 -0000
> > > > @@ -243,9 +243,22 @@ exit1(struct proc *p, int xexit, int xsi
> > > >  		if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT)
> > > >  			atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE);
> > > >  
> > > > -#ifdef __HAVE_PMAP_PURGE
> > > > -		pmap_purge(p);
> > > > +		/* Teardown the virtual address space. */
> > > > +		if ((p->p_flag & P_SYSTEM) == 0) {
> > > > +			/*
> > > > +			 * exit1() might be called with a lock count
> > > > +			 * greater than one and we want to ensure the
> > > > +			 * costly operation of tearing down the VM space
> > > > +			 * is performed unlocked.
> > > > +			 * It is safe to release them all since exit1()
> > > > +			 * will not return.
> > > > +			 */
> > > > +#ifdef MULTIPROCESSOR
> > > > +			__mp_release_all(&kernel_lock);
> > > >  #endif
> > > > +			uvm_purge();
> > > > +			KERNEL_LOCK();
> > > > +		}
> > > >  	}
> > > >  
> > > >  	p->p_fd = NULL;		/* zap the thread's copy */
> > > > Index: uvm/uvm_extern.h
> > > > ===================================================================
> > > > RCS file: /cvs/src/sys/uvm/uvm_extern.h,v
> > > > diff -u -p -r1.183 uvm_extern.h
> > > > --- uvm/uvm_extern.h	21 May 2025 16:59:32 -0000	1.183
> > > > +++ uvm/uvm_extern.h	24 May 2025 10:49:48 -0000
> > > > @@ -268,6 +268,7 @@ int			uvm_fault(vm_map_t, vaddr_t, vm_fa
> > > >  
> > > >  vaddr_t			uvm_uarea_alloc(void);
> > > >  void			uvm_uarea_free(struct proc *);
> > > > +void			uvm_purge(void);
> > > >  void			uvm_exit(struct process *);
> > > >  void			uvm_init_limits(struct plimit *);
> > > >  boolean_t		uvm_kernacc(caddr_t, size_t, int);
> > > > @@ -401,6 +402,7 @@ void			uvmspace_init(struct vmspace *, s
> > > >  void			uvmspace_exec(struct proc *, vaddr_t, vaddr_t);
> > > >  struct vmspace		*uvmspace_fork(struct process *);
> > > >  void			uvmspace_addref(struct vmspace *);
> > > > +void			uvmspace_purge(struct vmspace *);
> > > >  void			uvmspace_free(struct vmspace *);
> > > >  struct vmspace		*uvmspace_share(struct process *);
> > > >  int			uvm_sysctl(int *, u_int, void *, size_t *, 
> > > > Index: uvm/uvm_glue.c
> > > > ===================================================================
> > > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
> > > > diff -u -p -r1.90 uvm_glue.c
> > > > --- uvm/uvm_glue.c	20 May 2025 07:02:20 -0000	1.90
> > > > +++ uvm/uvm_glue.c	24 May 2025 10:49:48 -0000
> > > > @@ -286,6 +286,25 @@ uvm_uarea_free(struct proc *p)
> > > >  }
> > > >  
> > > >  /*
> > > > + * uvm_purge: teardown a virtual address space.
> > > > + *
> > > > + * If multi-threaded, must be called by the last thread of a process.
> > > > + */
> > > > +void
> > > > +uvm_purge(void)
> > > > +{
> > > > +	struct proc *p = curproc;
> > > > +	struct vmspace *vm = p->p_vmspace;
> > > > +
> > > > +	KERNEL_ASSERT_UNLOCKED();
> > > > +
> > > > +#ifdef __HAVE_PMAP_PURGE
> > > > +	pmap_purge(p);
> > > > +#endif
> > > > +	uvmspace_purge(vm);
> > > > +}
> > > > +
> > > > +/*
> > > >   * uvm_exit: exit a virtual address space
> > > >   */
> > > >  void
> > > > Index: uvm/uvm_map.c
> > > > ===================================================================
> > > > RCS file: /cvs/src/sys/uvm/uvm_map.c,v
> > > > diff -u -p -r1.344 uvm_map.c
> > > > --- uvm/uvm_map.c	21 May 2025 16:59:32 -0000	1.344
> > > > +++ uvm/uvm_map.c	24 May 2025 10:49:48 -0000
> > > > @@ -2491,6 +2491,24 @@ uvm_map_teardown(struct vm_map *map)
> > > >  		entry = TAILQ_NEXT(entry, dfree.deadq);
> > > >  	}
> > > >  
> > > > +#ifdef VMMAP_DEBUG
> > > > +	numt = numq = 0;
> > > > +	RBT_FOREACH(entry, uvm_map_addr, &map->addr)
> > > > +		numt++;
> > > > +	TAILQ_FOREACH(entry, &dead_entries, dfree.deadq)
> > > > +		numq++;
> > > > +	KASSERT(numt == numq);
> > > > +	/*
> > > > +	 * Let VMMAP_DEBUG checks work on an empty map if uvm_map_teardown()
> > > > +	 * is called twice.
> > > > +	 */
> > > > +	map->size = 0;
> > > > +	map->min_offset = 0;
> > > > +	map->max_offset = 0;
> > > > +	map->flags &= ~VM_MAP_ISVMSPACE;
> > > > +#endif
> > > > +	/* reinit RB tree since the above removal leaves the tree corrupted. */
> > > > +	RBT_INIT(uvm_map_addr, &map->addr);
> > > >  	vm_map_unlock(map);
> > > >  
> > > >  	/* Remove address selectors. */
> > > > @@ -2503,18 +2521,7 @@ uvm_map_teardown(struct vm_map *map)
> > > >  	uvm_addr_destroy(map->uaddr_brk_stack);
> > > >  	map->uaddr_brk_stack = NULL;
> > > >  
> > > > -#ifdef VMMAP_DEBUG
> > > > -	numt = numq = 0;
> > > > -	RBT_FOREACH(entry, uvm_map_addr, &map->addr)
> > > > -		numt++;
> > > > -	TAILQ_FOREACH(entry, &dead_entries, dfree.deadq)
> > > > -		numq++;
> > > > -	KASSERT(numt == numq);
> > > > -#endif
> > > >  	uvm_unmap_detach(&dead_entries, 0);
> > > > -
> > > > -	pmap_destroy(map->pmap);
> > > > -	map->pmap = NULL;
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -3395,6 +3402,24 @@ uvmspace_addref(struct vmspace *vm)
> > > >  	atomic_inc_int(&vm->vm_refcnt);
> > > >  }
> > > >  
> > > > +void
> > > > +uvmspace_purge(struct vmspace *vm)
> > > > +{
> > > > +#ifdef SYSVSHM
> > > > +	/* Get rid of any SYSV shared memory segments. */
> > > > +	if (vm->vm_shm != NULL) {
> > > > +		KERNEL_LOCK();
> > > > +		shmexit(vm);
> > > > +		KERNEL_UNLOCK();
> > > > +	}
> > > > +#endif
> > > > +	/*
> > > > +	 * Lock the map, to wait out all other references to it.  delete
> > > > +	 * all of the mappings and pages they hold.
> > > > +	 */
> > > > +	uvm_map_teardown(&vm->vm_map);
> > > > +}
> > > > +
> > > >  /*
> > > >   * uvmspace_free: free a vmspace data structure
> > > >   */
> > > > @@ -3403,20 +3428,15 @@ uvmspace_free(struct vmspace *vm)
> > > >  {
> > > >  	if (atomic_dec_int_nv(&vm->vm_refcnt) == 0) {
> > > >  		/*
> > > > -		 * lock the map, to wait out all other references to it.  delete
> > > > -		 * all of the mappings and pages they hold, then call the pmap
> > > > -		 * module to reclaim anything left.
> > > > +		 * Sanity check.  Kernel threads never end up here and
> > > > +		 * userland ones already tear down there VM space in
> > > > +		 * exit1().
> > > >  		 */
> > > > -#ifdef SYSVSHM
> > > > -		/* Get rid of any SYSV shared memory segments. */
> > > > -		if (vm->vm_shm != NULL) {
> > > > -			KERNEL_LOCK();
> > > > -			shmexit(vm);
> > > > -			KERNEL_UNLOCK();
> > > > -		}
> > > > -#endif
> > > > +		uvmspace_purge(vm);
> > > > +
> > > > +		pmap_destroy(vm->vm_map.pmap);
> > > > +		vm->vm_map.pmap = NULL;
> > > >  
> > > > -		uvm_map_teardown(&vm->vm_map);
> > > >  		pool_put(&uvm_vmspace_pool, vm);
> > > >  	}
> > > >  }
> > > > 
> > > > 
> > > 
> > > -- 
> > > :wq Claudio
> > > 
> 
> -- 
> :wq Claudio
>