From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
Subject: Re: One reaper per CPU
To: Claudio Jeker <cjeker@diehard.n-r-g.com>, Christian Ludwig <cludwig@genua.de>, tech@openbsd.org
Date: Thu, 11 Jul 2024 12:54:48 +0200

On Thu, Jul 11, 2024 at 12:46:56PM +0200, Martin Pieuchot wrote:
> On 10/07/24(Wed) 19:28, Jeremie Courreges-Anglas wrote:
> > On Wed, Jul 10, 2024 at 05:51:26PM +0200, Claudio Jeker wrote:
> > > On Wed, Jul 10, 2024 at 05:21:35PM +0200, Christian Ludwig wrote:
> > > > Hi,
> > > > 
> > > > This implements one reaper process per CPU. The benefit is that we can
> > > > switch to a safe stack (the reaper) on process teardown without the need
> > > > to lock data structures between sched_exit() and the local reaper. That
> > > > means the global deadproc mutex goes away. And there is also no need to
> > > > go through idle anymore, because we know that the local reaper is not
> > > > currently running when the dying proc enlists itself.
> > > > 
> > > > I have tested it lightly. It does not fall apart immediately. I do not
> > > > see any performance regressions in my test case, which is recompiling
> > > > the kernel over and over again, but your mileage my vary. So please give
> > > > this some serious testing on your boxes.
> > > > 
> > > I thought everyone agreed that we need less reaper and not more.
> > 
> > FWIW Christian and I sit are sitting next to each other during this
> > hackathon.  We've been chatting about this since yesterday, and I like
> > the per-cpu idea.  I tried to implement it in 2022 but didn't get a
> > working diff.
> 
> > The per-cpu reaper was art's initial plan when the reaper was
> > introduced in NetBSD.  NetBSD since reverted their use of a reaper,
> > but the idea still has nice properties:
> > - no need for locks and wakeup() between the exiting process and its
> >   dedicated reaper
> > - no need to garbage-collect exited threads like done on NetBSD or in
> >   my reaper diff, the percpu reaper can proc_free() them right away
> > - bye bye exit2()
> > - idle really is idle
> 
> I'm not convinced.  I though our goal was to get rid of the reaper to
> make sure process account for their own cleanup.
>  
> > > All the heavy teardown done in the reaper must be moved to exit1().
> > 
> > True.  But nothing in the per-cpu reaper approach prevents us from
> > moving uvm_exit() from the reaper to exit1().
> 
> Can we start with that then?  What can't be moved in exit1()?

That's my plan after mpi commits his reaper/uvm unlocking diff.

> For what
> do we need 80 new kernel threads on a 80 CPUs machine?
> 
> > > After that I think having a single reaper thread is just fine (we could
> > > even just use a task queue for cleaning up the final bits of a proc).
> > 
> > If this per-cpu reaper proves stable, I suspect it would still allow
> > for less contention (locks) and latency (wakeups) than a task.
> 
> Those are guesses.  I'd be more confident with analysis and numbers.

Fair enough!  Consensus in the hackroom is not to explore that option.

-- 
jca