Index | Thread | Search

From:
Claudio Jeker <cjeker@diehard.n-r-g.com>
Subject:
Re: Another attempt to get rid of the reaper
To:
Christian Ludwig <cludwig@mailbox.org>, tech@openbsd.org
Date:
Wed, 8 Oct 2025 17:07:01 +0200

Download raw body.

Thread
On Thu, Sep 18, 2025 at 05:45:46PM +0200, Martin Pieuchot wrote:
> Hello,
> 
> First of all I agree that we're late in the release cycle and that non
> trivial bits shouldn't be committed at this stage.

Sorry for the long delay but since the release was around the corner I put
this on the back of the list.
 
> On 16/09/25(Tue) 11:34, Claudio Jeker wrote:
> > On Sun, Sep 14, 2025 at 10:36:51PM +0200, Christian Ludwig wrote:
> > > Hi,
> > > 
> > > this is another attempt to get rid of the dedicated reaper thread.
> > 
> > Why is this a goal? What problem are you trying to solve with this?
> 
> I don't know what are Christian goals.  Here are the ones, I believe,
> we will all benefit from:
> 
> - The first goal is to ensure userland processes pay the price for their
> cleanup.  When this is not possible the best option is to make parents
> pay for their children.

I think by moving most into exit1() this goal is achieved. It only makes
sense to have parents pay the price if parent are actually around to
collect that penalty quickly. The problem is they are not.
 
> - Another goal is to reduce latency at exit by removing unnecessary context
> switches and contention.

This is a dumb argument. The number one reason for context switches right
now are rwlocks especially uobjlk, kmmaplk and a few other uvm related rw
locks. Also by moving the signaling of the parent into exit1() none of
those context switches matter.

Not sure which contention you are after but taking away 1 or 2 context
switches at process exit will not move the needle.
 
> - Another goal is to see the cost of ripping processes to help us pick the
> right algorithms and not only measure the parts we are interested in. 

I think right now the cost is more visible then adding it into the wait
system call. Ideally the cost of uarea free should be minimal (which is
not true right now).
 
> > In my opinion this diff makes the current exit situation worse. Instead of
> > having a clear reaper process that does the cleanup of the proc and
> > process we now end up delegating this work to init(8) or the parent
> > process. Neither are really ideal to do this work.
> 
> Please note that the parent process is already doing the cleanup via
> process_zap().  This is what we want.  We want all the work not done 
> in exit1() to be done in the parent.

Do we really want that? Do we want to have zombies with full uarea etc
sitting around?
 
> > You can not assume the parent will be sitting in wait(2) / dowait6() on
> > exit of a child.
> 
> If this is not a bug its because the parent called sigaction(2) with 
> SA_NOCLDWAIT in this case the child will not become a zombie and should
> be reaped by init(8). Then we can see how much time init(8) spent ripping
> non-zombie processes.

No, not really there are many cases where the parent is doing lots of work
and is happy to collect childs once at a later time.

I have a big issue with pushing work onto init(8). It is not the right
process to do that. Especially since we have something else that is better
suited to do that. Original unix did that because there was no better
option but it is a workaround for a problem that can now be solved by
kthreads or tasks.
 
> >                   Actually you can not assume that the parent will ever
> > call wait(2) so this change would allow the collection of many fat zombies
> > for no good reason.
> 
> This sentence makes no sense.  Zombies are already collected by their
> parents, non-zombies are currently collected by the reaper.  In this
> regard the suggested diff doesn't change much.  A single process context
> still continue to do the cleanup.

Zombies are only collected by the parents if they call wait(2).
So it is a big change, since now uarea is only deleted once that is done
and not like before close by the exit.
 
> If you're worried about parents that do not collect their zombies, this
> diff doesn't change anything in that regard and the behavior stays the
> same.

It does. See above.
 
> > Wait(2) only needs a few bit of information (r- and tusage, signal and exit
> > information) to work, so things like the uarea really should be removed
> > early on and not linger around until the zombie is collected.
> 
> I agree with that.  I'm all for putting as much as possible in exit1().
> 
> > The reaper right now is no longer a bottle neck, it uses very CPU little
> > time.  I agree that moving more from the reaper into exit1() is a good
> > thing, like the wakeup signaling to the parent process. But I think this
> > goes a few steps to far and introduces complex problems for very little
> > benefit.
> 
> I disagree with you on both points.  The reaper is still a bottle neck
> for many use cases.  I'd happy to explain further if you're interested.

Very much. Right now I only see the reaper because it hammers locks that
any other process would do so as well. That needs to be fixed anyway.
 
> IMHO Christian's work is awesome.  It is going in the right direction.
> Yes there are some points that can be improved but this is much better
> than what we currently have.
> 
> > For me the reaper thread by itself is not an issue, it helps to finish up
> > the tricky bits of cleanup on exit quickly.
> 
> The reaper is an issue because:
> 
> - it is a bottleneck in the signaling of dead processes which adds latency

Agreed, I already mentioned multiple times that this is the important but
and it should be moved into exit1() to reduce latency to a minimum.

> - it requires extra context switches which add extra contention and latency

I very much doubt this. If the signaling is done in exit1() most will
work.

> - it is one of the reasons OpenBSD %sys time is so high

Ha ha ha ha. Moving it to dowait6() will keep the %sys at the same level.
This code is still run in the kernel.

> - it makes us blind to the cost of reaping processes (hint rb-tree)

We already reap most of the heave bits (uvm_purge) in exit1().
Is disposing of an empty vmspace still such a heavy operation?

> - all of that prevents tools like dpb(1) which gather data about
>   processes execution to do a better job
 
dpb(1) is primarily file system bound. It can not scale because it hammers
the disks and VFS so hard that everything spins.

> If you aren't convinced after reading all of this I'm available to
> answer your questions.  For me there's no doubt the reaper should go
> away.  Now I'm well aware this is not the time to push forward.

I'm only partially convinced. As I said, lets move the signaling of the
parent to exit1() as a first step. Don't over do it. Don't abuse init(8).

-- 
:wq Claudio