From: "Theo de Raadt" Subject: Re: ip sysctl atomic To: Mark Kettenis Cc: Alexander Bluhm , tech@openbsd.org, claudio@openbsd.org Date: Sat, 04 May 2024 14:34:53 -0600 I think this sentence in the sysctl(2) manual page is the real source of this problem: Unless explicitly noted below, sysctl() returns a consistent snapshot of the data requested. Consistency is obtained by locking the destination buffer into memory so that the data may be copied out without blocking. Calls to sysctl() are serialized to avoid deadlock. What does that even mean, in a modern world? So the kernel provides a consistant view. Great. But the threads and processes don't neccessarily inspect the result immediately, so do they even know the order they completed in? I think the locking being described here is impossible to do efficiently, and if that's the case, maybe we should realize variable sized things are too expensive and start removing sysctl's from libc and just using #define constants instead? I mean, that's where things go, if we can't make this efficient. A vague thing in the back of my brain is how this relates to the atomicity of new fd allocation from syscalls like open(). POSIX says you get the lowest fd. However, in a threaded environment if you tried to test for this you'd see from the threaded viewpoint this is difficult to test as true or false, because another thread may have done such a call and will inspect it's result before you. lowest-fd is only truly visible from the full-process context. I'm mentioning this, because threads were mentioned and I want to know if what Mark is describing is "too much": Is it actually relevant that different threads see different intermediate results? Are these intermediate results actually only discoverable to be true from a whole process viewpoint? As a final point, I think bringing RW sysctl's into this is a bit of a distraction. First, that's root only. They are probably only being changed from A -> B once, and not numerous times while being inspected by other (non-root) processes. The cost of sysctl has to be improved, the locking impacts are really brutal. Over two decades there have been various large-axe approaches which didn't work out. So maybe this strategy of whittling away at the most significant ones is worth it?