Download raw body.
uvm_purge()
On Thu, May 15, 2025 at 01:01:07PM +0200, Martin Pieuchot wrote:
> On 15/05/25(Thu) 11:17, Claudio Jeker wrote:
> > On Mon, May 12, 2025 at 08:42:40PM +0200, Martin Pieuchot wrote:
> > > Diff below moves the tearing down of VM space to exit1(). It implies
> > > that processes will now be charged for cleaning their VM space. As a
> > > result their %sys time will increase.
> > >
> > > uvm_purge() is called in the "top" part of exit1() when the process is
> > > still allowed to sleep. To allow this piece of code to be executed in
> > > parallel, all recursive kernel_lock tokens need to be released.
> > >
> > > In the diff below uvm_purge() is called twice. The first time in
> > > exit1() and the second time, doing nothing but integrity checks, in
> > > uvm_exit(). Another approach could be to just check that the map is
> > > empty. Any preferences?
> > >
> > > Comments? Oks?
> >
> > Comments inline.
>
> Thanks for the review. Answers and new diff below.
>
> >
> > > Index: kern/kern_exit.c
> > > ===================================================================
> > > RCS file: /cvs/src/sys/kern/kern_exit.c,v
> > > diff -u -p -r1.245 kern_exit.c
> > > --- kern/kern_exit.c 2 May 2025 05:04:38 -0000 1.245
> > > +++ kern/kern_exit.c 12 May 2025 18:10:35 -0000
> > > @@ -241,6 +241,14 @@ exit1(struct proc *p, int xexit, int xsi
> > > */
> > > if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT)
> > > atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE);
> > > +
> > > + /* Teardown the virtual address space. */
> > > +#ifdef MULTIPROCESSOR
> > > + __mp_release_all(&kernel_lock);
> > > +#endif
> > > + uvm_purge(pr);
> > > + KERNEL_LOCK();
> > > +
> >
> > I prefer if PS_NOZOMBIE is set at the end (after uvm_purge).
> > I know it should not matter but it makes more sense in my head to mark a
> > process with PS_NOZOMBIE close to the end of exit1.
>
> I agree, such diff is ok mpi@. I won't mix this refactoring with the
> introduction of uvm_purge() though.
Sure, we can do this after this is in.
> > > Index: uvm/uvm_glue.c
> > > ===================================================================
> > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
> > > diff -u -p -r1.88 uvm_glue.c
> > > --- uvm/uvm_glue.c 21 Mar 2025 13:19:33 -0000 1.88
> > > +++ uvm/uvm_glue.c 12 May 2025 18:05:34 -0000
> > > @@ -286,6 +286,19 @@ uvm_uarea_free(struct proc *p)
> > > }
> > >
> > > /*
> > > + * uvm_purge: teardown a virtual address space
> > > + */
> > > +void
> > > +uvm_purge(struct process *pr)
> > > +{
> > > + struct vmspace *vm = pr->ps_vmspace;
> > > +
> > > + KERNEL_ASSERT_UNLOCKED();
> > > +
> > > + uvmspace_purge(vm);
> > > +}
> >
> > I think here you miss a check. Neither uvm_purge() nor uvmspace_purge()
> > check the vmspace refcount. In my diff I used this here:
> >
> > if (atomic_load_int(&vm->vm_refcnt) != 1)
> > return;
> >
> > At least kthread_exit() calls exit1() with a shared vm_refcnt of proc0
> > and without that check a kthread_exit() would purge the kernel vmspace
> > which is very bad.
>
> Indeed. I forgot about kernel threads exiting. I deliberately decided
> to not check for refcounts for userland processes. See below.
>
> > The other places, where the vm_refcnt is increased, are in process_domem()
> > (ptrace) and in sysctl_proc_args() both check PS_EXITING upfront but then
> > sleep while holding the reference. So if the process exits while those are
> > asleep we also end up in fireworks. For those parts I'm not even sure if
> > going through exit1, reaper and process_zap before those call uvmspace_free()
> > would not result in some use-after-free or other bad access.
> > I try to look at all the logic there and think about the consequences when
> > it comes to vmspace references.
>
> When a process wants to peak at the VM space of a dying one, the
> vm_map_lock() is what guarantees integrity of the data structures.
> So in the worst case the VM space will appear empty.
So that would then result in a failed uvm_io operation. I guess that is
fine.
> I decided to not care about reference counts for userland processes
> because I want to be sure the process itself will be charged for
> cleaning its VM space. This can be seen as a huge munmap(2) of the
> whole address space.
I agree with that. We want the process to do the work in exit1 and not
have some other thread do this on behalf.
> The reference protects the 'struct vmspace' which needs to stay around
> for the reasons mentioned above.
I am more worried that some code in this exit1, reaper, process_zap jungle
would change a value inside struct vmspace that would break the assumption
you made about the map lock. I had a look and did not spot anything right
away.
> For that reasons updated diff below skips uvmspace_purge() in exit1() for
> kernel threads.
This is fine with me.
> Done. I also moved cleanups needed for VMMAP_DEBUG builds where they belong.
Some bikeshed requests below. Apart from that OK claudio@
I will throw it onto some systems to test.
> Index: uvm/uvm_map.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_map.c,v
> diff -u -p -r1.341 uvm_map.c
> --- uvm/uvm_map.c 21 Apr 2025 14:46:18 -0000 1.341
> +++ uvm/uvm_map.c 15 May 2025 10:56:06 -0000
> @@ -305,7 +305,7 @@ vaddr_t uvm_maxkaddr;
> *
> * Addresses are unique.
> * Entries with start == end may only exist if they are the first entry
> - * (sorted by address) within a free-memory tree.
> + (sorted by address) within a free-memory tree.
You lost a * here.
> */
>
> static inline int
> @@ -2491,6 +2491,24 @@ uvm_map_teardown(struct vm_map *map)
> entry = TAILQ_NEXT(entry, dfree.deadq);
> }
>
> +#ifdef VMMAP_DEBUG
> + numt = numq = 0;
> + RBT_FOREACH(entry, uvm_map_addr, &map->addr)
> + numt++;
> + TAILQ_FOREACH(entry, &dead_entries, dfree.deadq)
> + numq++;
> + KASSERT(numt == numq);
> + /*
> + * Let VMMAP_DEBUG checks work on an empty map if uvm_map_teardown()
> + * is called twice.
> + */
> + map->size = 0;
> + map->min_offset = 0;
> + map->max_offset = 0;
> + map->flags &= ~VM_MAP_ISVMSPACE;
> +#endif
> + /* reinit RB tree since the above removal leaves the tree corrupted. */
> + RBT_INIT(uvm_map_addr, &map->addr);
> vm_map_unlock(map);
>
> /* Remove address selectors. */
> @@ -2503,18 +2521,7 @@ uvm_map_teardown(struct vm_map *map)
> uvm_addr_destroy(map->uaddr_brk_stack);
> map->uaddr_brk_stack = NULL;
>
> -#ifdef VMMAP_DEBUG
> - numt = numq = 0;
> - RBT_FOREACH(entry, uvm_map_addr, &map->addr)
> - numt++;
> - TAILQ_FOREACH(entry, &dead_entries, dfree.deadq)
> - numq++;
> - KASSERT(numt == numq);
> -#endif
> uvm_unmap_detach(&dead_entries, 0);
> -
> - pmap_destroy(map->pmap);
> - map->pmap = NULL;
> }
>
> /*
> @@ -3395,6 +3402,26 @@ uvmspace_addref(struct vmspace *vm)
> atomic_inc_int(&vm->vm_refcnt);
> }
>
> +void
> +uvmspace_purge(struct vmspace *vm)
> +{
> +#ifdef SYSVSHM
> + /* Get rid of any SYSV shared memory segments. */
> + if (vm->vm_shm != NULL) {
> + KERNEL_LOCK();
> + shmexit(vm);
> + KERNEL_UNLOCK();
> + }
> +#endif
> +
> + /*
> + * Lock the map, to wait out all other references to it. delete
> + * all of the mappings and pages they hold, then call the pmap
> + * module to reclaim anything left.
This comment is not quite correct anymore.
At least I think 'then call the pmap module to reclaim anything left.'
should be removed, since that is a reference to the pmap_destroy() call
which is now in uvmspace_free().
> + */
> + uvm_map_teardown(&vm->vm_map);
> +}
> +
> /*
> * uvmspace_free: free a vmspace data structure
> */
> @@ -3402,21 +3429,11 @@ void
> uvmspace_free(struct vmspace *vm)
> {
> if (atomic_dec_int_nv(&vm->vm_refcnt) == 0) {
> - /*
> - * lock the map, to wait out all other references to it. delete
> - * all of the mappings and pages they hold, then call the pmap
> - * module to reclaim anything left.
> - */
> -#ifdef SYSVSHM
> - /* Get rid of any SYSV shared memory segments. */
> - if (vm->vm_shm != NULL) {
> - KERNEL_LOCK();
> - shmexit(vm);
> - KERNEL_UNLOCK();
> - }
> -#endif
> + uvmspace_purge(vm);
> +
I think you could add the pmap comment here if you like, but I have to say
that the code here is now very self-explanatory.
> + pmap_destroy(vm->vm_map.pmap);
> + vm->vm_map.pmap = NULL;
>
> - uvm_map_teardown(&vm->vm_map);
> pool_put(&uvm_vmspace_pool, vm);
> }
> }
>
>
--
:wq Claudio
uvm_purge()