Index | Thread | Search

From:
"Theo de Raadt" <deraadt@openbsd.org>
Subject:
Re: ip sysctl atomic
To:
Mark Kettenis <mark.kettenis@xs4all.nl>
Cc:
Alexander Bluhm <bluhm@openbsd.org>, tech@openbsd.org, claudio@openbsd.org
Date:
Sat, 04 May 2024 14:34:53 -0600

Download raw body.

Thread
I think this sentence in the sysctl(2) manual page is the real source
of this problem:

     Unless explicitly noted below, sysctl() returns a consistent snapshot of
     the data requested.  Consistency is obtained by locking the destination
     buffer into memory so that the data may be copied out without blocking.
     Calls to sysctl() are serialized to avoid deadlock.

What does that even mean, in a modern world?  So the kernel provides a
consistant view.  Great.  But the threads and processes don't neccessarily
inspect the result immediately, so do they even know the order they
completed in?

I think the locking being described here is impossible to do efficiently,
and if that's the case, maybe we should realize variable sized things are
too expensive and start removing sysctl's from libc and just using #define
constants instead?  I mean, that's where things go, if we can't make this
efficient.


A vague thing in the back of my brain is how this relates to the atomicity
of new fd allocation from syscalls like open().  POSIX says you get the
lowest fd.  However, in a threaded environment if you tried to test for
this you'd see from the threaded viewpoint this is difficult to test as
true or false, because another thread may have done such a call and will
inspect it's result before you.  lowest-fd is only truly visible from the
full-process context.   I'm mentioning this, because threads were mentioned
and I want to know if what Mark is describing is "too much":  Is it actually
relevant that different threads see different intermediate results?  Are
these intermediate results actually only discoverable to be true from a whole
process viewpoint?

As a final point, I think bringing RW sysctl's into this is a bit of a
distraction.  First, that's root only.  They are probably only being
changed from A -> B once, and not numerous times while being inspected
by other (non-root) processes.

The cost of sysctl has to be improved, the locking impacts are really brutal.
Over two decades there have been various large-axe approaches which didn't
work out.  So maybe this strategy of whittling away at the most significant
ones is worth it?