Index | Thread | Search

From:
Martin Pieuchot <mpi@grenadille.net>
Subject:
Re: Another attempt to get rid of the reaper
To:
Christian Ludwig <cludwig@mailbox.org>, tech@openbsd.org
Date:
Thu, 18 Sep 2025 17:45:46 +0200

Download raw body.

Thread
Hello,

First of all I agree that we're late in the release cycle and that non
trivial bits shouldn't be committed at this stage.

On 16/09/25(Tue) 11:34, Claudio Jeker wrote:
> On Sun, Sep 14, 2025 at 10:36:51PM +0200, Christian Ludwig wrote:
> > Hi,
> > 
> > this is another attempt to get rid of the dedicated reaper thread.
> 
> Why is this a goal? What problem are you trying to solve with this?

I don't know what are Christian goals.  Here are the ones, I believe,
we will all benefit from:

- The first goal is to ensure userland processes pay the price for their
cleanup.  When this is not possible the best option is to make parents
pay for their children.

- Another goal is to reduce latency at exit by removing unnecessary context
switches and contention.

- Another goal is to see the cost of ripping processes to help us pick the
right algorithms and not only measure the parts we are interested in. 

> In my opinion this diff makes the current exit situation worse. Instead of
> having a clear reaper process that does the cleanup of the proc and
> process we now end up delegating this work to init(8) or the parent
> process. Neither are really ideal to do this work.

Please note that the parent process is already doing the cleanup via
process_zap().  This is what we want.  We want all the work not done 
in exit1() to be done in the parent.

> You can not assume the parent will be sitting in wait(2) / dowait6() on
> exit of a child.

If this is not a bug its because the parent called sigaction(2) with 
SA_NOCLDWAIT in this case the child will not become a zombie and should
be reaped by init(8). Then we can see how much time init(8) spent ripping
non-zombie processes.

>                   Actually you can not assume that the parent will ever
> call wait(2) so this change would allow the collection of many fat zombies
> for no good reason.

This sentence makes no sense.  Zombies are already collected by their
parents, non-zombies are currently collected by the reaper.  In this
regard the suggested diff doesn't change much.  A single process context
still continue to do the cleanup.

If you're worried about parents that do not collect their zombies, this
diff doesn't change anything in that regard and the behavior stays the
same.

> Wait(2) only needs a few bit of information (r- and tusage, signal and exit
> information) to work, so things like the uarea really should be removed
> early on and not linger around until the zombie is collected.

I agree with that.  I'm all for putting as much as possible in exit1().

> The reaper right now is no longer a bottle neck, it uses very CPU little
> time.  I agree that moving more from the reaper into exit1() is a good
> thing, like the wakeup signaling to the parent process. But I think this
> goes a few steps to far and introduces complex problems for very little
> benefit.

I disagree with you on both points.  The reaper is still a bottle neck
for many use cases.  I'd happy to explain further if you're interested.

IMHO Christian's work is awesome.  It is going in the right direction.
Yes there are some points that can be improved but this is much better
than what we currently have.

> For me the reaper thread by itself is not an issue, it helps to finish up
> the tricky bits of cleanup on exit quickly.

The reaper is an issue because:

- it is a bottleneck in the signaling of dead processes which adds latency
- it requires extra context switches which add extra contention and latency
- it is one of the reasons OpenBSD %sys time is so high
- it makes us blind to the cost of reaping processes (hint rb-tree)
- all of that prevents tools like dpb(1) which gather data about
  processes execution to do a better job

If you aren't convinced after reading all of this I'm available to
answer your questions.  For me there's no doubt the reaper should go
away.  Now I'm well aware this is not the time to push forward.

Thanks for reading,
Martin