Index | Thread | Search

From:
Claudio Jeker <cjeker@diehard.n-r-g.com>
Subject:
Re: uvm_purge()
To:
tech@openbsd.org
Date:
Thu, 15 May 2025 11:17:45 +0200

Download raw body.

Thread
  • Claudio Jeker:

    uvm_purge()

  • On Mon, May 12, 2025 at 08:42:40PM +0200, Martin Pieuchot wrote:
    > Diff below moves the tearing down of VM space to exit1().  It implies
    > that processes will now be charged for cleaning their VM space.  As a
    > result their %sys time will increase.
    > 
    > uvm_purge() is called in the "top" part of exit1() when the process is
    > still allowed to sleep.  To allow this piece of code to be executed in
    > parallel, all recursive kernel_lock tokens need to be released. 
    > 
    > In the diff below uvm_purge() is called twice.  The first time in
    > exit1() and the second time, doing nothing but integrity checks, in
    > uvm_exit().  Another approach could be to just check that the map is
    > empty.  Any preferences?
    > 
    > Comments?  Oks?
    
    Comments inline.
     
    > Index: kern/kern_exit.c
    > ===================================================================
    > RCS file: /cvs/src/sys/kern/kern_exit.c,v
    > diff -u -p -r1.245 kern_exit.c
    > --- kern/kern_exit.c	2 May 2025 05:04:38 -0000	1.245
    > +++ kern/kern_exit.c	12 May 2025 18:10:35 -0000
    > @@ -241,6 +241,14 @@ exit1(struct proc *p, int xexit, int xsi
    >  		 */
    >  		if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT)
    >  			atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE);
    > +
    > +		/* Teardown the virtual address space. */
    > +#ifdef MULTIPROCESSOR
    > +		__mp_release_all(&kernel_lock);
    > +#endif
    > +		uvm_purge(pr);
    > +		KERNEL_LOCK();
    > +
    
    I prefer if PS_NOZOMBIE is set at the end (after uvm_purge).
    I know it should not matter but it makes more sense in my head to mark a
    process with PS_NOZOMBIE close to the end of exit1.
    
    >  	}
    >  
    >  	p->p_fd = NULL;		/* zap the thread's copy */
    > Index: uvm/uvm_extern.h
    > ===================================================================
    > RCS file: /cvs/src/sys/uvm/uvm_extern.h,v
    > diff -u -p -r1.180 uvm_extern.h
    > --- uvm/uvm_extern.h	19 Nov 2024 06:18:26 -0000	1.180
    > +++ uvm/uvm_extern.h	12 May 2025 18:05:34 -0000
    > @@ -268,6 +268,7 @@ int			uvm_fault(vm_map_t, vaddr_t, vm_fa
    >  
    >  vaddr_t			uvm_uarea_alloc(void);
    >  void			uvm_uarea_free(struct proc *);
    > +void			uvm_purge(struct process *);
    >  void			uvm_exit(struct process *);
    >  void			uvm_init_limits(struct plimit *);
    >  boolean_t		uvm_kernacc(caddr_t, size_t, int);
    > @@ -401,6 +402,7 @@ void			uvmspace_init(struct vmspace *, s
    >  void			uvmspace_exec(struct proc *, vaddr_t, vaddr_t);
    >  struct vmspace		*uvmspace_fork(struct process *);
    >  void			uvmspace_addref(struct vmspace *);
    > +void			uvmspace_purge(struct vmspace *);
    >  void			uvmspace_free(struct vmspace *);
    >  struct vmspace		*uvmspace_share(struct process *);
    >  int			uvm_share(vm_map_t, vaddr_t, vm_prot_t,
    > Index: uvm/uvm_glue.c
    > ===================================================================
    > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
    > diff -u -p -r1.88 uvm_glue.c
    > --- uvm/uvm_glue.c	21 Mar 2025 13:19:33 -0000	1.88
    > +++ uvm/uvm_glue.c	12 May 2025 18:05:34 -0000
    > @@ -286,6 +286,19 @@ uvm_uarea_free(struct proc *p)
    >  }
    >  
    >  /*
    > + * uvm_purge: teardown a virtual address space
    > + */
    > +void
    > +uvm_purge(struct process *pr)
    > +{
    > +	struct vmspace *vm = pr->ps_vmspace;
    > +
    > +	KERNEL_ASSERT_UNLOCKED();
    > +
    > +	uvmspace_purge(vm);
    > +}
    
    I think here you miss a check. Neither uvm_purge() nor uvmspace_purge()
    check the vmspace refcount. In my diff I used this here:
    
    	if (atomic_load_int(&vm->vm_refcnt) != 1)
    		return;
    
    At least kthread_exit() calls exit1() with a shared vm_refcnt of proc0
    and without that check a kthread_exit() would purge the kernel vmspace
    which is very bad.
    
    You call uvmspace_purge() in uvmspace_free() which is called by uvm_exit
    so the above early return works since uvmspace_free() decreases the refcnt
    and would then do the work (which should never happen since proc0 never
    drops its last reference).
    
    The other places, where the vm_refcnt is increased, are in process_domem()
    (ptrace) and in sysctl_proc_args() both check PS_EXITING upfront but then
    sleep while holding the reference. So if the process exits while those are
    asleep we also end up in fireworks. For those parts I'm not even sure if
    going through exit1, reaper and process_zap before those call uvmspace_free()
    would not result in some use-after-free or other bad access.
    I try to look at all the logic there and think about the consequences when
    it comes to vmspace references.
    
    > +
    > +/*
    >   * uvm_exit: exit a virtual address space
    >   */
    >  void
    > Index: uvm/uvm_map.c
    > ===================================================================
    > RCS file: /cvs/src/sys/uvm/uvm_map.c,v
    > diff -u -p -r1.341 uvm_map.c
    > --- uvm/uvm_map.c	21 Apr 2025 14:46:18 -0000	1.341
    > +++ uvm/uvm_map.c	12 May 2025 18:05:34 -0000
    > @@ -2491,6 +2491,19 @@ uvm_map_teardown(struct vm_map *map)
    >  		entry = TAILQ_NEXT(entry, dfree.deadq);
    >  	}
    >  
    > +#ifdef VMMAP_DEBUG
    > +	numt = numq = 0;
    > +	RBT_FOREACH(entry, uvm_map_addr, &map->addr)
    > +		numt++;
    > +	TAILQ_FOREACH(entry, &dead_entries, dfree.deadq)
    > +		numq++;
    > +	KASSERT(numt == numq);
    > +#endif
    
    I would add a comment here why all these values are reset.
    It is not fully obvious that the upper parts of uvm_map_teardown() leaves
    the map in a inconsistent state. In my diff I had this comment here:
    
    	/* reinit RB tree since the above removal leaves the tree corrupted */
    
    > +	RBT_INIT(uvm_map_addr, &map->addr);
    > +	map->size = 0;
    > +	map->min_offset = 0;
    > +	map->max_offset = 0;
    > +	map->flags &= ~VM_MAP_ISVMSPACE;
    >  	vm_map_unlock(map);
    >  
    >  	/* Remove address selectors. */
    > @@ -2503,18 +2516,7 @@ uvm_map_teardown(struct vm_map *map)
    >  	uvm_addr_destroy(map->uaddr_brk_stack);
    >  	map->uaddr_brk_stack = NULL;
    >  
    > -#ifdef VMMAP_DEBUG
    > -	numt = numq = 0;
    > -	RBT_FOREACH(entry, uvm_map_addr, &map->addr)
    > -		numt++;
    > -	TAILQ_FOREACH(entry, &dead_entries, dfree.deadq)
    > -		numq++;
    > -	KASSERT(numt == numq);
    > -#endif
    >  	uvm_unmap_detach(&dead_entries, 0);
    > -
    > -	pmap_destroy(map->pmap);
    > -	map->pmap = NULL;
    >  }
    >  
    >  /*
    > @@ -3395,6 +3397,21 @@ uvmspace_addref(struct vmspace *vm)
    >  	atomic_inc_int(&vm->vm_refcnt);
    >  }
    >  
    > +void
    > +uvmspace_purge(struct vmspace *vm)
    > +{
    > +#ifdef SYSVSHM
    > +	/* Get rid of any SYSV shared memory segments. */
    > +	if (vm->vm_shm != NULL) {
    > +		KERNEL_LOCK();
    > +		shmexit(vm);
    > +		KERNEL_UNLOCK();
    > +	}
    > +#endif
    > +
    > +	uvm_map_teardown(&vm->vm_map);
    > +}
    > +
    >  /*
    >   * uvmspace_free: free a vmspace data structure
    >   */
    > @@ -3407,16 +3424,11 @@ uvmspace_free(struct vmspace *vm)
    >  		 * all of the mappings and pages they hold, then call the pmap
    >  		 * module to reclaim anything left.
    >  		 */
    > -#ifdef SYSVSHM
    > -		/* Get rid of any SYSV shared memory segments. */
    > -		if (vm->vm_shm != NULL) {
    > -			KERNEL_LOCK();
    > -			shmexit(vm);
    > -			KERNEL_UNLOCK();
    > -		}
    > -#endif
    > +		uvmspace_purge(vm);
    > +
    > +		pmap_destroy(vm->vm_map.pmap);
    > +		vm->vm_map.pmap = NULL;
    >  
    > -		uvm_map_teardown(&vm->vm_map);
    >  		pool_put(&uvm_vmspace_pool, vm);
    >  	}
    >  }
    > 
    > 
    
    With the refcnt issue fixed I think this looks very good.
    
    -- 
    :wq Claudio
    
    
  • Claudio Jeker:

    uvm_purge()