Index | Thread | Search

From:
Martin Pieuchot <mpi@grenadille.net>
Subject:
Re: Faster _exit(2) for a faster userland: R.I.P the reaper
To:
Mark Kettenis <mark.kettenis@xs4all.nl>, tedu@tedunangst.com, dlg@openbsd.org, visa@openbsd.org, tech@openbsd.org, jan@openbsd.org, ajacoutot@openbsd.org
Date:
Tue, 6 May 2025 11:04:55 +0200

Download raw body.

Thread
On 06/05/25(Tue) 10:46, Claudio Jeker wrote:
> On Mon, May 05, 2025 at 10:46:38PM +0200, Mark Kettenis wrote:
> > > Date: Mon, 5 May 2025 17:25:02 +0200
> > > From: Claudio Jeker <cjeker@diehard.n-r-g.com>
> > > 
> > > On Sun, May 04, 2025 at 08:38:06AM -0600, Theo de Raadt wrote:
> > > > Martin Pieuchot <mpi@grenadille.net> wrote:
> > > > 
> > > > > > I do not understand any of the diff.  It seems to be trying to return a
> > > > > > child to the parent, before resources are cleared.
> > > > > 
> > > > > It doesn't.  Order of operations didn't change.  The KERNEL_LOCK() is
> > > > > there and asserted.
> > > > 
> > > > I did not mention the KERNEL_LOCK().
> > > > 
> > > > > I disagree.  I need to remove the reaper.  That *is* the problem.
> > > > 
> > > > You are using proc0 in an very dangerous way.
> > > > 
> > > > > > I've called this uvmspace_purge().  It would be a bit like
> > > > > > 
> > > > > >      uvm_unmap_remove(p->p_vmspace, 0, VM_MAXUSER_ADDRESS,
> > > > > >          &dead_entries, FALSE, TRUE, TRUE)
> > > > > > 
> > > > > > Except there are clearly optimizations.  The function can sleep.  It is
> > > > > > called quite early in process context.
> > > > > > [...] 
> > > > > 
> > > > > There's no need to add a new function.  Calling uvm_exit() from exit1()
> > > > > is enough and work but that's not buying us much.
> > > > 
> > > > As proc0?  Disagree strongly on this direction.  The process should free
> > > > it's own pages out of it's own map.  It can do that if uvm_exit() is split.
> > > > 
> > > > But you've made up your mind to ignore everything everyone says.  You don't
> > > > even reply to emails about the proposals.  Your public requests for feedback
> > > > come off as fake.
> > > 
> > > Here is a minimal diff implementing uvm_purge() in the dumbest way
> > > possible. It removes the reaper from the hot path without any scary hacks.
> > > 
> > > It only fixes uvm_map_teardown() so it can be called again.
> > 
> > But we shouldn't have to call it again no?
> 
> There is a complication that has to do with VM space sharing (which is
> only used by proc0). So it is easier for now to offload proc0 handling to
> uvm_exit() (which does the refcnt update and check if it hit 0).

Not only that.  Look at uvmspace_addref() usages in /sys/kern.

That means the vmspace can be freed from another process context.

> > > I tried to zap the pmap first (with pmap_remove of the full user address
> > > space) but that is very slow.
> > 
> > Our page tables are typically only populated sparsely.  So this
> > doesn't surprise me.
> 
> Yes, it was a try, I was hoping that pmap_remove would be walk the sparse
> pmap tables efficently. It does not.
>  
> > > I think pmap can be optimized a lot here but we don't have an api
> > > for that yet.
> > 
> > What optimizations are you thinking off?  The only "easy" optimization
> > I found in the teardown was replacing the TLB flushes with a single
> > ASID-based TLB flush on arm64.
> 
> So there are a few things I noticed. uvm_map_teardown() spends about the
> same amount of time in uvm as in pmap. It calls pmap many many times so
> there is a lot of call overhead.

It depends on the address space and architecture, but it is of the same
order of magnitude.  When measuring page fault performances on amd64
it's about 75% in uvm_unmap_detach() which is basically the RB-tree
insertion and 25% in pmap_do_remove().

> The moment uvm purges the userland mappings a split KU arch (arm64,
> sparc64, ...) can nuke the pmap, flush out the tlbs via ASID and then
> instruct uvm_map_teardown() to skip any pmap calls.
> 
> For systems witk K + U in pmap (i386, amd64) the situation is more complex. 
> In that case we can kill all U mappings, even do the ASID flush but we
> need to keep the K mappings around and with that enough of pmap so that
> cpu_switchto() is able to jump off the thread.
> Still doing this will still allow uvm_map_teardown() to skip most pmap
> calls.
> 
> As you said the pmap tables are populated very sparesly and uvm removes
> these blocks in a inefficent way.
>  
> > Speaking about that optimization, I think your diff breaks it.  The
> > optimization relies on calling pmap_deactivate() in cpu_exit() before
> > teardown starts.  But now that you do the teardown in exit1() it
> > happens before that call is made.  So that'll have to be fixed.
> 
> Agreed.