Index | Thread | Search

From:
Martin Pieuchot <mpi@grenadille.net>
Subject:
Re: uvm_purge()
To:
Mark Kettenis <mark.kettenis@xs4all.nl>
Cc:
tech@openbsd.org
Date:
Wed, 14 May 2025 11:55:32 +0200

Download raw body.

Thread
  • Martin Pieuchot:

    uvm_purge()

    • Mark Kettenis:

      uvm_purge()

      • Martin Pieuchot:

        uvm_purge()

On 13/05/25(Tue) 22:31, Mark Kettenis wrote:
> > Date: Tue, 13 May 2025 14:21:44 +0200
> > From: Martin Pieuchot <mpi@grenadille.net>
> > 
> > On 12/05/25(Mon) 22:00, Mark Kettenis wrote:
> > > > Date: Mon, 12 May 2025 20:42:40 +0200
> > > > From: Martin Pieuchot <mpi@grenadille.net>
> > > 
> > > Hi Martin,
> > > 
> > > > Diff below moves the tearing down of VM space to exit1().  It implies
> > > > that processes will now be charged for cleaning their VM space.  As a
> > > > result their %sys time will increase.
> > > > 
> > > > uvm_purge() is called in the "top" part of exit1() when the process is
> > > > still allowed to sleep.  To allow this piece of code to be executed in
> > > > parallel, all recursive kernel_lock tokens need to be released. 
> > > > 
> > > > In the diff below uvm_purge() is called twice.  The first time in
> > > > exit1() and the second time, doing nothing but integrity checks, in
> > > > uvm_exit().  Another approach could be to just check that the map is
> > > > empty.  Any preferences?
> > > > 
> > > > Comments?  Oks?
> > > 
> > > As I explained to Claudio in the "Faster _exit(2) for a faster
> > > userland: R.I.P the reaper" thread, this will disable the TLB flushing
> > > optimization I added on arm64 a few months ago.  The uvm_purge() call
> > > needs to come after the userland page tables have been disabled and
> > > the TLBs have been flushed.
> > > 
> > > Unfortunately the way I hooked that operation into pmap_deactivate()
> > > makes this non-trivial.  The easiest way out may be to add a new
> > > pmap_purge() that does this operation and that gets called from an
> > > appropriate spot exit1().  This must happen after the other threads of
> > > the process have exited, so somewhere after the point where the main
> > > threads waits for them to die.
> > 
> > As long as this only touches userland page tables it can be inside
> > uvm_purge().  
> > 
> > > I can cook you a diff for that.
> > 
> > Thanks!
> > 
> 
> So here is a proof of concept; not sure I'm entirely happy with this
> yet.  But hopefully it gets across what's the challenge.

Lovely, more inputs below that hopefully can help simplify this proof of
concept.

> 
> It is important that when we flush the TLB, none of the threads in a
> process have the userland page tables active.  On arm64 the CPUs can
> speculatively load TLB entries even if you don't reference the pages!
> The current code deactivates the page tables in cpu_exit() and uses
> atomics to make sure that the last thread that goes through cpu_exit()
> also flushes the TLB.  At that point none of the threads can sleep, so
> we can simply set the TTBR0_EL1 register to point at a page filled
> with zeroes and don't have to worry about a context switch resetting
> TTBR0_EL1 to point at the userland page tables again.  (We re-use the
> page filled with zeroes from the kernel pmap for that.)
> 
> But uvm_purge() can sleep, so it needs to be called much earlier.  We
> can prevent a context switch from reloading TTBR0_EL1 by also setting
> pm->pm_pt0a to point at that page filled with zeroes.  But if we do
> that for any non-main thread, we run into problems because another
> thread that is still running userland code might context switch and
> end up in an endless loop faulting because it has a page table without
> valied entries in it.
>
> So that's why my new pmap_exit() function gets called in different
> places for the main thread and other threads.  The main thread calls
> pmap_exit() at the place where your diff calls uvm_purge(), so it
> could be rolled into that function.
> 
> I think this strategy will work for other pmaps as well, but I
> probably should look at one or two other ones.

uvm_purge() is executed by the last thread in a process.  When this
happens the other threads might still be at the end of exit1() but none
of them will go back to userland.

I have other diffs to improve the synchronization between the siblings
of a process when exiting, mostly to remove unnecessary context switches.
They are built on the current assumption that uvm_purge() is called when
all other threads have cleaned their states.
This part of my work removes the notion of 'main thread' and the
P_THREAD flag.  Instead the last thread of a process to enter exit1()
will clean the per-process states.  

Could you use those two pieces of information to simplify your diff?

> Index: arch/arm64/arm64/pmap.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v
> diff -u -p -r1.111 pmap.c
> --- arch/arm64/arm64/pmap.c	14 Feb 2025 18:36:04 -0000	1.111
> +++ arch/arm64/arm64/pmap.c	13 May 2025 20:09:45 -0000
> @@ -946,6 +946,7 @@ pmap_pinit(pmap_t pm)
>  	pmap_extract(pmap_kernel(), l0va, (paddr_t *)&pm->pm_pt0pa);
>  
>  	pmap_reference(pm);
> +	pm->pm_active = 1;
>  }
>  
>  int pmap_vp_poolcache = 0; /* force vp poolcache to allocate late */
> @@ -1482,7 +1483,6 @@ pmap_activate(struct proc *p)
>  {
>  	pmap_t pm = p->p_vmspace->vm_map.pmap;
>  
> -	atomic_inc_int(&pm->pm_active);
>  	if (p == curproc && pm != curcpu()->ci_curpm)
>  		pmap_setttb(p);
>  }
> @@ -1493,6 +1493,11 @@ pmap_activate(struct proc *p)
>  void
>  pmap_deactivate(struct proc *p)
>  {
> +}
> +
> +void
> +pmap_exit(struct proc *p)
> +{
>  	pmap_t pm = p->p_vmspace->vm_map.pmap;
>  
>  	KASSERT(p == curproc);
> @@ -1500,11 +1505,15 @@ pmap_deactivate(struct proc *p)
>  	WRITE_SPECIALREG(ttbr0_el1, pmap_kernel()->pm_pt0pa);
>  	__asm volatile("isb");
>  
> -	if (atomic_dec_int_nv(&pm->pm_active) > 0)
> +	if ((p->p_flag & P_THREAD))
>  		return;
>  
> +	KASSERT(p->p_p->ps_threadcnt == 0);
> +
>  	cpu_tlb_flush_asid_all((uint64_t)pm->pm_asid << 48);
>  	cpu_tlb_flush_asid_all((uint64_t)(pm->pm_asid | ASID_USER) << 48);
> +	pm->pm_pt0pa = pmap_kernel()->pm_pt0pa;
> +	pm->pm_active = 0;
>  }
>  
>  /*
> Index: arch/arm64/include/pmap.h
> ===================================================================
> RCS file: /cvs/src/sys/arch/arm64/include/pmap.h,v
> diff -u -p -r1.28 pmap.h
> --- arch/arm64/include/pmap.h	3 Feb 2025 17:59:40 -0000	1.28
> +++ arch/arm64/include/pmap.h	13 May 2025 20:09:45 -0000
> @@ -125,6 +125,8 @@ int	pmap_fault_fixup(pmap_t, vaddr_t, vm
>  #define __HAVE_PMAP_MPSAFE_ENTER_COW
>  #define __HAVE_PMAP_POPULATE
>  
> +void	pmap_exit(struct proc *);
> +
>  #endif /* _KERNEL && !_LOCORE */
>  
>  #ifndef _LOCORE
> Index: kern/kern_exit.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_exit.c,v
> diff -u -p -r1.245 kern_exit.c
> --- kern/kern_exit.c	2 May 2025 05:04:38 -0000	1.245
> +++ kern/kern_exit.c	13 May 2025 20:09:48 -0000
> @@ -241,6 +241,8 @@ exit1(struct proc *p, int xexit, int xsi
>  		 */
>  		if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT)
>  			atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE);
> +
> +		pmap_exit(p);
>  	}
>  
>  	p->p_fd = NULL;		/* zap the thread's copy */
> @@ -369,6 +371,7 @@ exit1(struct proc *p, int xexit, int xsi
>  
>  	/* just a thread? check if last one standing. */
>  	if (p->p_flag & P_THREAD) {
> +		pmap_exit(p);
>  		/* scheduler_wait_hook(pr->ps_mainproc, p); XXX */
>  		mtx_enter(&pr->ps_mtx);
>  		pr->ps_exitcnt--;