Index | Thread | Search

From:
Martin Pieuchot <mpi@grenadille.net>
Subject:
Re: uvm_purge()
To:
Mark Kettenis <mark.kettenis@xs4all.nl>, claudio@openbsd.org
Cc:
tech@openbsd.org
Date:
Thu, 15 May 2025 12:34:16 +0200

Download raw body.

Thread
  • Martin Pieuchot:

    uvm_purge()

    • Mark Kettenis:

      uvm_purge()

      • Martin Pieuchot:

        uvm_purge()

On 15/05/25(Thu) 11:52, Mark Kettenis wrote:
> > Date: Wed, 14 May 2025 11:55:32 +0200
> > From: Martin Pieuchot <mpi@grenadille.net>
> 
> Hi Martin,
> 
> Sorry, I'm a bit slow.  Have concerts coming up so my OpenBSD time is
> a bit limited at the moment.

No worries, thanks for your answer!

> [...]
> > > It is important that when we flush the TLB, none of the threads in a
> > > process have the userland page tables active.  On arm64 the CPUs can
> > > speculatively load TLB entries even if you don't reference the pages!
> > > The current code deactivates the page tables in cpu_exit() and uses
> > > atomics to make sure that the last thread that goes through cpu_exit()
> > > also flushes the TLB.  At that point none of the threads can sleep, so
> > > we can simply set the TTBR0_EL1 register to point at a page filled
> > > with zeroes and don't have to worry about a context switch resetting
> > > TTBR0_EL1 to point at the userland page tables again.  (We re-use the
> > > page filled with zeroes from the kernel pmap for that.)
> > > 
> > > But uvm_purge() can sleep, so it needs to be called much earlier.  We
> > > can prevent a context switch from reloading TTBR0_EL1 by also setting
> > > pm->pm_pt0a to point at that page filled with zeroes.  But if we do
> > > that for any non-main thread, we run into problems because another
> > > thread that is still running userland code might context switch and
> > > end up in an endless loop faulting because it has a page table without
> > > valied entries in it.
> > >
> > > So that's why my new pmap_exit() function gets called in different
> > > places for the main thread and other threads.  The main thread calls
> > > pmap_exit() at the place where your diff calls uvm_purge(), so it
> > > could be rolled into that function.
> > > 
> > > I think this strategy will work for other pmaps as well, but I
> > > probably should look at one or two other ones.
> > 
> > uvm_purge() is executed by the last thread in a process.  When this
> > happens the other threads might still be at the end of exit1() but none
> > of them will go back to userland.
> > 
> > I have other diffs to improve the synchronization between the siblings
> > of a process when exiting, mostly to remove unnecessary context switches.
> > They are built on the current assumption that uvm_purge() is called when
> > all other threads have cleaned their states.
> > This part of my work removes the notion of 'main thread' and the
> > P_THREAD flag.  Instead the last thread of a process to enter exit1()
> > will clean the per-process states.  
> > 
> > Could you use those two pieces of information to simplify your diff?
> 
> This suggests that I really should have two seperate functions, one
> which gets called for each thread exit (which disables the userland
> page tables) and one that gets called from uvm_purge() (which does the
> TLB flush and can clean up the pmap in the future).  That way I don't
> have to rely on P_THREAD to determine what to do.

I agree.

> The latter function should probably be called pmap_purge() and it is
> fine if we call it for the "last thread in the process" instead of
> what we currently consider the "main thread".  But this function still
> needs to make sure it runs after the other threads have disabled their
> userland page tables.  And as you point out, at the point where
> uvm_purge() gets called the other threads might still be at the tail
> end of exit1().
> 
> I'm a bit hesitant to add yet another "public" pmap interface, so it
> would be nice to have cpu_exit() handle the MD-specifics of disabling
> the userland page tables.  I could put back the atomics to manipulate
> pm_active and keep that code in pmap_deactivate().  The ordering issue
> between calling uvm_purge() and the other threads going through
> cpu_exit() is largely theoretical as there isn't much code between the
> wakup(&pr->ps_threads) and the cpu_exit() call.  And I could make
> pmap_purge() block until pm_active becomes 1 to make sure the other
> threads have gone through cpu_exit().

I agree being able to use pmap_deactivate() is cleaner.

I'd like to avoid adding another barrier in exit1().  I'm actually
working hard to remove them as much as possible to reduce existing
latency.

I can send my other diff with the signaling to wake up the last thread
after cpu_exit().  This will require some plumbing to move sched_exit()
out of cpu_exit().

Claudio, you have a diff that does that, no?  Do you want to take care of
it or should I?