Index | Thread | Search

From:
Claudio Jeker <claudio@openbsd.org>
Subject:
Re: uvm_purge()
To:
Mark Kettenis <mark.kettenis@xs4all.nl>, tech@openbsd.org
Date:
Wed, 21 May 2025 11:14:09 +0200

Download raw body.

Thread
    • Mark Kettenis:

      uvm_purge()

      • Martin Pieuchot:

        uvm_purge()

      • Claudio Jeker:

        uvm_purge()

  • Claudio Jeker:

    uvm_purge()

  • On Wed, May 21, 2025 at 08:51:11AM +0200, Martin Pieuchot wrote:
    > On 20/05/25(Tue) 22:04, Mark Kettenis wrote:
    > > > Date: Fri, 16 May 2025 15:58:41 +0200
    > > > From: Martin Pieuchot <mpi@grenadille.net>
    > > > 
    > > > On 15/05/25(Thu) 14:45, Mark Kettenis wrote:
    > > > > > Date: Thu, 15 May 2025 12:34:16 +0200
    > > > > > From: Martin Pieuchot <mpi@grenadille.net>
    > > > > > 
    > > > > > On 15/05/25(Thu) 11:52, Mark Kettenis wrote:
    > > > > > > > Date: Wed, 14 May 2025 11:55:32 +0200
    > > > > > > > From: Martin Pieuchot <mpi@grenadille.net>
    > > > > > > 
    > > > > > > Hi Martin,
    > > > > > > 
    > > > > > > Sorry, I'm a bit slow.  Have concerts coming up so my OpenBSD time is
    > > > > > > a bit limited at the moment.
    > > > > > 
    > > > > > No worries, thanks for your answer!
    > > > > > 
    > > > > > > [...]
    > > > > > > > > It is important that when we flush the TLB, none of the threads in a
    > > > > > > > > process have the userland page tables active.  On arm64 the CPUs can
    > > > > > > > > speculatively load TLB entries even if you don't reference the pages!
    > > > > > > > > The current code deactivates the page tables in cpu_exit() and uses
    > > > > > > > > atomics to make sure that the last thread that goes through cpu_exit()
    > > > > > > > > also flushes the TLB.  At that point none of the threads can sleep, so
    > > > > > > > > we can simply set the TTBR0_EL1 register to point at a page filled
    > > > > > > > > with zeroes and don't have to worry about a context switch resetting
    > > > > > > > > TTBR0_EL1 to point at the userland page tables again.  (We re-use the
    > > > > > > > > page filled with zeroes from the kernel pmap for that.)
    > > > > > > > > 
    > > > > > > > > But uvm_purge() can sleep, so it needs to be called much earlier.  We
    > > > > > > > > can prevent a context switch from reloading TTBR0_EL1 by also setting
    > > > > > > > > pm->pm_pt0a to point at that page filled with zeroes.  But if we do
    > > > > > > > > that for any non-main thread, we run into problems because another
    > > > > > > > > thread that is still running userland code might context switch and
    > > > > > > > > end up in an endless loop faulting because it has a page table without
    > > > > > > > > valied entries in it.
    > > > > > > > >
    > > > > > > > > So that's why my new pmap_exit() function gets called in different
    > > > > > > > > places for the main thread and other threads.  The main thread calls
    > > > > > > > > pmap_exit() at the place where your diff calls uvm_purge(), so it
    > > > > > > > > could be rolled into that function.
    > > > > > > > > 
    > > > > > > > > I think this strategy will work for other pmaps as well, but I
    > > > > > > > > probably should look at one or two other ones.
    > > > > > > > 
    > > > > > > > uvm_purge() is executed by the last thread in a process.  When this
    > > > > > > > happens the other threads might still be at the end of exit1() but none
    > > > > > > > of them will go back to userland.
    > > > > > > > 
    > > > > > > > I have other diffs to improve the synchronization between the siblings
    > > > > > > > of a process when exiting, mostly to remove unnecessary context switches.
    > > > > > > > They are built on the current assumption that uvm_purge() is called when
    > > > > > > > all other threads have cleaned their states.
    > > > > > > > This part of my work removes the notion of 'main thread' and the
    > > > > > > > P_THREAD flag.  Instead the last thread of a process to enter exit1()
    > > > > > > > will clean the per-process states.  
    > > > > > > > 
    > > > > > > > Could you use those two pieces of information to simplify your diff?
    > > > > > > 
    > > > > > > This suggests that I really should have two seperate functions, one
    > > > > > > which gets called for each thread exit (which disables the userland
    > > > > > > page tables) and one that gets called from uvm_purge() (which does the
    > > > > > > TLB flush and can clean up the pmap in the future).  That way I don't
    > > > > > > have to rely on P_THREAD to determine what to do.
    > > > > > 
    > > > > > I agree.
    > > > > > 
    > > > > > > The latter function should probably be called pmap_purge() and it is
    > > > > > > fine if we call it for the "last thread in the process" instead of
    > > > > > > what we currently consider the "main thread".  But this function still
    > > > > > > needs to make sure it runs after the other threads have disabled their
    > > > > > > userland page tables.  And as you point out, at the point where
    > > > > > > uvm_purge() gets called the other threads might still be at the tail
    > > > > > > end of exit1().
    > > > > > > 
    > > > > > > I'm a bit hesitant to add yet another "public" pmap interface, so it
    > > > > > > would be nice to have cpu_exit() handle the MD-specifics of disabling
    > > > > > > the userland page tables.  I could put back the atomics to manipulate
    > > > > > > pm_active and keep that code in pmap_deactivate().  The ordering issue
    > > > > > > between calling uvm_purge() and the other threads going through
    > > > > > > cpu_exit() is largely theoretical as there isn't much code between the
    > > > > > > wakup(&pr->ps_threads) and the cpu_exit() call.  And I could make
    > > > > > > pmap_purge() block until pm_active becomes 1 to make sure the other
    > > > > > > threads have gone through cpu_exit().
    > > > > > 
    > > > > > I agree being able to use pmap_deactivate() is cleaner.
    > > > > > 
    > > > > > I'd like to avoid adding another barrier in exit1().  I'm actually
    > > > > > working hard to remove them as much as possible to reduce existing
    > > > > > latency.
    > > > > 
    > > > > The added "barrier" in pmap_purge() would be something like:
    > > > > 
    > > > >         while (pm->pm_active != 1)
    > > > > 	        CPU_BUSY_CYCLE;
    > > > > 
    > > > > And I don't think we'd execute that loop under normal conditions.  We
    > > > > could put that in as a temporary measure until the rest of this is
    > > > > figured out to make sure arm64 isn't blocking your progress.  We can
    > > > > replace the while loop with a KASSERT when the time comes to do that.
    > > > 
    > > > Fine with me.
    > > > 
    > > > > Alternatively we could skip the pmap_purge() optimization if the other
    > > > > threads haven't gone through cpu_exit() yet.
    > > > > 
    > > > > > I can send my other diff with the signaling to wake up the last thread
    > > > > > after cpu_exit().  This will require some plumbing to move sched_exit()
    > > > > > out of cpu_exit().
    > > > > 
    > > > > Yes, I think moving the sched_exit() out of cpu_exit() and into
    > > > > exit1() would make sense.  I wonder if we should try to move
    > > > > pmap_deactivate() into exit1() as well?  Almost all cpu_exit()
    > > > > implementations call it just before sched_exit().
    > > > 
    > > > Moving pmap_deactivate() as well makes sense to me.  I'll do it.
    > > > 
    > > > > Anyway, diff below is something I'm happier with.
    > > > 
    > > > Running with it as well.
    > > > 
    > > > ok mpi@
    > > 
    > > I've added a comment explaining the (need for a) barrier.  If there
    > > are no objections to introducing yet another pmap interface, I'll go
    > > ahead with this.
    > 
    > I believe that's what we want.
    > 
    > ok mpi@
    
    Agreed, we can work on this in tree.
    We want to have this pmap interface in most archs at some point so I doubt
    this is a problem. The pmap API was built when MP was about 2 or maybe 4
    CPUs. Not 16, 80 or 256. There is a lot of things we need to adjust for
    such modern systems.
     
    > > Index: arch/arm64/arm64/pmap.c
    > > ===================================================================
    > > RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v
    > > diff -u -p -r1.111 pmap.c
    > > --- arch/arm64/arm64/pmap.c	14 Feb 2025 18:36:04 -0000	1.111
    > > +++ arch/arm64/arm64/pmap.c	20 May 2025 20:03:10 -0000
    > > @@ -1497,14 +1497,39 @@ pmap_deactivate(struct proc *p)
    > >  
    > >  	KASSERT(p == curproc);
    > >  
    > > +	if (pm->pm_active == 0)
    > > +		return;
    > > +
    > >  	WRITE_SPECIALREG(ttbr0_el1, pmap_kernel()->pm_pt0pa);
    > >  	__asm volatile("isb");
    > >  
    > > -	if (atomic_dec_int_nv(&pm->pm_active) > 0)
    > > -		return;
    > > +	atomic_dec_int(&pm->pm_active);
    > > +}
    > > +
    > > +void
    > > +pmap_purge(struct proc *p)
    > > +{
    > > +	pmap_t pm = p->p_vmspace->vm_map.pmap;
    > > +
    > > +	KASSERT(p->p_p->ps_threadcnt == 0);
    > > +	KASSERT(p == curproc);
    > > +
    > > +	/*
    > > +	 * There is a theoretical chance that our sibling threads are
    > > +	 * still making their way through the tail end of exit1().
    > > +	 * Make absolutely sure they have made it past the point where
    > > +	 * they disable their userland page tables.
    > > +	 */
    > > +	while (pm->pm_active != 1)
    > > +		CPU_BUSY_CYCLE();
    > > +
    > > +	WRITE_SPECIALREG(ttbr0_el1, pmap_kernel()->pm_pt0pa);
    > > +	__asm volatile("isb");
    > >  
    > >  	cpu_tlb_flush_asid_all((uint64_t)pm->pm_asid << 48);
    > >  	cpu_tlb_flush_asid_all((uint64_t)(pm->pm_asid | ASID_USER) << 48);
    > > +	pm->pm_pt0pa = pmap_kernel()->pm_pt0pa;
    > > +	pm->pm_active = 0;
    > >  }
    > >  
    > >  /*
    > > Index: arch/arm64/include/pmap.h
    > > ===================================================================
    > > RCS file: /cvs/src/sys/arch/arm64/include/pmap.h,v
    > > diff -u -p -r1.28 pmap.h
    > > --- arch/arm64/include/pmap.h	3 Feb 2025 17:59:40 -0000	1.28
    > > +++ arch/arm64/include/pmap.h	20 May 2025 20:03:10 -0000
    > > @@ -124,6 +124,7 @@ int	pmap_fault_fixup(pmap_t, vaddr_t, vm
    > >  
    > >  #define __HAVE_PMAP_MPSAFE_ENTER_COW
    > >  #define __HAVE_PMAP_POPULATE
    > > +#define __HAVE_PMAP_PURGE
    > >  
    > >  #endif /* _KERNEL && !_LOCORE */
    > >  
    > 
    > 
    
    -- 
    :wq Claudio
    
    
  • Claudio Jeker:

    uvm_purge()