Index | Thread | Search

From:
Martin Pieuchot <mpi@grenadille.net>
Subject:
Re: uvm_purge()
To:
tech@openbsd.org
Date:
Thu, 15 May 2025 13:01:07 +0200

Download raw body.

Thread
  • Claudio Jeker:

    uvm_purge()

    • Martin Pieuchot:

      uvm_purge()

  • On 15/05/25(Thu) 11:17, Claudio Jeker wrote:
    > On Mon, May 12, 2025 at 08:42:40PM +0200, Martin Pieuchot wrote:
    > > Diff below moves the tearing down of VM space to exit1().  It implies
    > > that processes will now be charged for cleaning their VM space.  As a
    > > result their %sys time will increase.
    > > 
    > > uvm_purge() is called in the "top" part of exit1() when the process is
    > > still allowed to sleep.  To allow this piece of code to be executed in
    > > parallel, all recursive kernel_lock tokens need to be released. 
    > > 
    > > In the diff below uvm_purge() is called twice.  The first time in
    > > exit1() and the second time, doing nothing but integrity checks, in
    > > uvm_exit().  Another approach could be to just check that the map is
    > > empty.  Any preferences?
    > > 
    > > Comments?  Oks?
    > 
    > Comments inline.
    
    Thanks for the review.  Answers and new diff below.
    
    >  
    > > Index: kern/kern_exit.c
    > > ===================================================================
    > > RCS file: /cvs/src/sys/kern/kern_exit.c,v
    > > diff -u -p -r1.245 kern_exit.c
    > > --- kern/kern_exit.c	2 May 2025 05:04:38 -0000	1.245
    > > +++ kern/kern_exit.c	12 May 2025 18:10:35 -0000
    > > @@ -241,6 +241,14 @@ exit1(struct proc *p, int xexit, int xsi
    > >  		 */
    > >  		if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT)
    > >  			atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE);
    > > +
    > > +		/* Teardown the virtual address space. */
    > > +#ifdef MULTIPROCESSOR
    > > +		__mp_release_all(&kernel_lock);
    > > +#endif
    > > +		uvm_purge(pr);
    > > +		KERNEL_LOCK();
    > > +
    > 
    > I prefer if PS_NOZOMBIE is set at the end (after uvm_purge).
    > I know it should not matter but it makes more sense in my head to mark a
    > process with PS_NOZOMBIE close to the end of exit1.
    
    I agree, such diff is ok mpi@.  I won't mix this refactoring with the
    introduction of uvm_purge() though.
    
    > >  	}
    > >  
    > >  	p->p_fd = NULL;		/* zap the thread's copy */
    > > Index: uvm/uvm_extern.h
    > > ===================================================================
    > > RCS file: /cvs/src/sys/uvm/uvm_extern.h,v
    > > diff -u -p -r1.180 uvm_extern.h
    > > --- uvm/uvm_extern.h	19 Nov 2024 06:18:26 -0000	1.180
    > > +++ uvm/uvm_extern.h	12 May 2025 18:05:34 -0000
    > > @@ -268,6 +268,7 @@ int			uvm_fault(vm_map_t, vaddr_t, vm_fa
    > >  
    > >  vaddr_t			uvm_uarea_alloc(void);
    > >  void			uvm_uarea_free(struct proc *);
    > > +void			uvm_purge(struct process *);
    > >  void			uvm_exit(struct process *);
    > >  void			uvm_init_limits(struct plimit *);
    > >  boolean_t		uvm_kernacc(caddr_t, size_t, int);
    > > @@ -401,6 +402,7 @@ void			uvmspace_init(struct vmspace *, s
    > >  void			uvmspace_exec(struct proc *, vaddr_t, vaddr_t);
    > >  struct vmspace		*uvmspace_fork(struct process *);
    > >  void			uvmspace_addref(struct vmspace *);
    > > +void			uvmspace_purge(struct vmspace *);
    > >  void			uvmspace_free(struct vmspace *);
    > >  struct vmspace		*uvmspace_share(struct process *);
    > >  int			uvm_share(vm_map_t, vaddr_t, vm_prot_t,
    > > Index: uvm/uvm_glue.c
    > > ===================================================================
    > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
    > > diff -u -p -r1.88 uvm_glue.c
    > > --- uvm/uvm_glue.c	21 Mar 2025 13:19:33 -0000	1.88
    > > +++ uvm/uvm_glue.c	12 May 2025 18:05:34 -0000
    > > @@ -286,6 +286,19 @@ uvm_uarea_free(struct proc *p)
    > >  }
    > >  
    > >  /*
    > > + * uvm_purge: teardown a virtual address space
    > > + */
    > > +void
    > > +uvm_purge(struct process *pr)
    > > +{
    > > +	struct vmspace *vm = pr->ps_vmspace;
    > > +
    > > +	KERNEL_ASSERT_UNLOCKED();
    > > +
    > > +	uvmspace_purge(vm);
    > > +}
    > 
    > I think here you miss a check. Neither uvm_purge() nor uvmspace_purge()
    > check the vmspace refcount. In my diff I used this here:
    > 
    > 	if (atomic_load_int(&vm->vm_refcnt) != 1)
    > 		return;
    > 
    > At least kthread_exit() calls exit1() with a shared vm_refcnt of proc0
    > and without that check a kthread_exit() would purge the kernel vmspace
    > which is very bad.
    
    Indeed.  I forgot about kernel threads exiting.  I deliberately decided
    to not check for refcounts for userland processes.   See below.
    
    > The other places, where the vm_refcnt is increased, are in process_domem()
    > (ptrace) and in sysctl_proc_args() both check PS_EXITING upfront but then
    > sleep while holding the reference. So if the process exits while those are
    > asleep we also end up in fireworks. For those parts I'm not even sure if
    > going through exit1, reaper and process_zap before those call uvmspace_free()
    > would not result in some use-after-free or other bad access.
    > I try to look at all the logic there and think about the consequences when
    > it comes to vmspace references.
    
    When a process wants to peak at the VM space of a dying one, the
    vm_map_lock() is what guarantees integrity of the data structures.
    So in the worst case the VM space will appear empty.
    
    I decided to not care about reference counts for userland processes
    because I want to be sure the process itself will be charged for
    cleaning its VM space.  This can be seen as a huge munmap(2) of the
    whole address space.
    The reference protects the 'struct vmspace' which needs to stay around
    for the reasons mentioned above.
    
    For that reasons updated diff below skips uvmspace_purge() in exit1() for
    kernel threads.
    
    > > +
    > > +/*
    > >   * uvm_exit: exit a virtual address space
    > >   */
    > >  void
    > > Index: uvm/uvm_map.c
    > > ===================================================================
    > > RCS file: /cvs/src/sys/uvm/uvm_map.c,v
    > > diff -u -p -r1.341 uvm_map.c
    > > --- uvm/uvm_map.c	21 Apr 2025 14:46:18 -0000	1.341
    > > +++ uvm/uvm_map.c	12 May 2025 18:05:34 -0000
    > > @@ -2491,6 +2491,19 @@ uvm_map_teardown(struct vm_map *map)
    > >  		entry = TAILQ_NEXT(entry, dfree.deadq);
    > >  	}
    > >  
    > > +#ifdef VMMAP_DEBUG
    > > +	numt = numq = 0;
    > > +	RBT_FOREACH(entry, uvm_map_addr, &map->addr)
    > > +		numt++;
    > > +	TAILQ_FOREACH(entry, &dead_entries, dfree.deadq)
    > > +		numq++;
    > > +	KASSERT(numt == numq);
    > > +#endif
    > 
    > I would add a comment here why all these values are reset.
    > It is not fully obvious that the upper parts of uvm_map_teardown() leaves
    > the map in a inconsistent state. In my diff I had this comment here:
    > 
    > 	/* reinit RB tree since the above removal leaves the tree corrupted */
    
    Done.  I also moved cleanups needed for VMMAP_DEBUG builds where they belong.
    
    Index: kern/kern_exit.c
    ===================================================================
    RCS file: /cvs/src/sys/kern/kern_exit.c,v
    diff -u -p -r1.245 kern_exit.c
    --- kern/kern_exit.c	2 May 2025 05:04:38 -0000	1.245
    +++ kern/kern_exit.c	15 May 2025 10:39:00 -0000
    @@ -241,6 +241,16 @@ exit1(struct proc *p, int xexit, int xsi
     		 */
     		if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT)
     			atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE);
    +
    +		/* Teardown the virtual address space. */
    +		if ((p->p_flag & P_SYSTEM) == 0) {
    +#ifdef MULTIPROCESSOR
    +			__mp_release_all(&kernel_lock);
    +#endif
    +			uvm_purge(pr);
    +			KERNEL_LOCK();
    +		}
    +
     	}
     
     	p->p_fd = NULL;		/* zap the thread's copy */
    Index: uvm/uvm_extern.h
    ===================================================================
    RCS file: /cvs/src/sys/uvm/uvm_extern.h,v
    diff -u -p -r1.180 uvm_extern.h
    --- uvm/uvm_extern.h	19 Nov 2024 06:18:26 -0000	1.180
    +++ uvm/uvm_extern.h	15 May 2025 10:35:58 -0000
    @@ -268,6 +268,7 @@ int			uvm_fault(vm_map_t, vaddr_t, vm_fa
     
     vaddr_t			uvm_uarea_alloc(void);
     void			uvm_uarea_free(struct proc *);
    +void			uvm_purge(struct process *);
     void			uvm_exit(struct process *);
     void			uvm_init_limits(struct plimit *);
     boolean_t		uvm_kernacc(caddr_t, size_t, int);
    @@ -401,6 +402,7 @@ void			uvmspace_init(struct vmspace *, s
     void			uvmspace_exec(struct proc *, vaddr_t, vaddr_t);
     struct vmspace		*uvmspace_fork(struct process *);
     void			uvmspace_addref(struct vmspace *);
    +void			uvmspace_purge(struct vmspace *);
     void			uvmspace_free(struct vmspace *);
     struct vmspace		*uvmspace_share(struct process *);
     int			uvm_share(vm_map_t, vaddr_t, vm_prot_t,
    Index: uvm/uvm_glue.c
    ===================================================================
    RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
    diff -u -p -r1.88 uvm_glue.c
    --- uvm/uvm_glue.c	21 Mar 2025 13:19:33 -0000	1.88
    +++ uvm/uvm_glue.c	15 May 2025 10:37:45 -0000
    @@ -286,6 +286,19 @@ uvm_uarea_free(struct proc *p)
     }
     
     /*
    + * uvm_purge: teardown a virtual address space
    + */
    +void
    +uvm_purge(struct process *pr)
    +{
    +	struct vmspace *vm = pr->ps_vmspace;
    +
    +	KERNEL_ASSERT_UNLOCKED();
    +
    +	uvmspace_purge(vm);
    +}
    +
    +/*
      * uvm_exit: exit a virtual address space
      */
     void
    Index: uvm/uvm_map.c
    ===================================================================
    RCS file: /cvs/src/sys/uvm/uvm_map.c,v
    diff -u -p -r1.341 uvm_map.c
    --- uvm/uvm_map.c	21 Apr 2025 14:46:18 -0000	1.341
    +++ uvm/uvm_map.c	15 May 2025 10:56:06 -0000
    @@ -305,7 +305,7 @@ vaddr_t uvm_maxkaddr;
      *
      * Addresses are unique.
      * Entries with start == end may only exist if they are the first entry
    - * (sorted by address) within a free-memory tree.
    +  (sorted by address) within a free-memory tree.
      */
     
     static inline int
    @@ -2491,6 +2491,24 @@ uvm_map_teardown(struct vm_map *map)
     		entry = TAILQ_NEXT(entry, dfree.deadq);
     	}
     
    +#ifdef VMMAP_DEBUG
    +	numt = numq = 0;
    +	RBT_FOREACH(entry, uvm_map_addr, &map->addr)
    +		numt++;
    +	TAILQ_FOREACH(entry, &dead_entries, dfree.deadq)
    +		numq++;
    +	KASSERT(numt == numq);
    +	/*
    +	 * Let VMMAP_DEBUG checks work on an empty map if uvm_map_teardown()
    +	 * is called twice.
    +	 */
    +	map->size = 0;
    +	map->min_offset = 0;
    +	map->max_offset = 0;
    +	map->flags &= ~VM_MAP_ISVMSPACE;
    +#endif
    +	/* reinit RB tree since the above removal leaves the tree corrupted. */
    +	RBT_INIT(uvm_map_addr, &map->addr);
     	vm_map_unlock(map);
     
     	/* Remove address selectors. */
    @@ -2503,18 +2521,7 @@ uvm_map_teardown(struct vm_map *map)
     	uvm_addr_destroy(map->uaddr_brk_stack);
     	map->uaddr_brk_stack = NULL;
     
    -#ifdef VMMAP_DEBUG
    -	numt = numq = 0;
    -	RBT_FOREACH(entry, uvm_map_addr, &map->addr)
    -		numt++;
    -	TAILQ_FOREACH(entry, &dead_entries, dfree.deadq)
    -		numq++;
    -	KASSERT(numt == numq);
    -#endif
     	uvm_unmap_detach(&dead_entries, 0);
    -
    -	pmap_destroy(map->pmap);
    -	map->pmap = NULL;
     }
     
     /*
    @@ -3395,6 +3402,26 @@ uvmspace_addref(struct vmspace *vm)
     	atomic_inc_int(&vm->vm_refcnt);
     }
     
    +void
    +uvmspace_purge(struct vmspace *vm)
    +{
    +#ifdef SYSVSHM
    +	/* Get rid of any SYSV shared memory segments. */
    +	if (vm->vm_shm != NULL) {
    +		KERNEL_LOCK();
    +		shmexit(vm);
    +		KERNEL_UNLOCK();
    +	}
    +#endif
    +
    +	/*
    +	 * Lock the map, to wait out all other references to it.  delete
    +	 * all of the mappings and pages they hold, then call the pmap
    +	 * module to reclaim anything left.
    +	 */
    +	uvm_map_teardown(&vm->vm_map);
    +}
    +
     /*
      * uvmspace_free: free a vmspace data structure
      */
    @@ -3402,21 +3429,11 @@ void
     uvmspace_free(struct vmspace *vm)
     {
     	if (atomic_dec_int_nv(&vm->vm_refcnt) == 0) {
    -		/*
    -		 * lock the map, to wait out all other references to it.  delete
    -		 * all of the mappings and pages they hold, then call the pmap
    -		 * module to reclaim anything left.
    -		 */
    -#ifdef SYSVSHM
    -		/* Get rid of any SYSV shared memory segments. */
    -		if (vm->vm_shm != NULL) {
    -			KERNEL_LOCK();
    -			shmexit(vm);
    -			KERNEL_UNLOCK();
    -		}
    -#endif
    +		uvmspace_purge(vm);
    +
    +		pmap_destroy(vm->vm_map.pmap);
    +		vm->vm_map.pmap = NULL;
     
    -		uvm_map_teardown(&vm->vm_map);
     		pool_put(&uvm_vmspace_pool, vm);
     	}
     }
    
    
    
  • Claudio Jeker:

    uvm_purge()