From: Mark Kettenis Subject: Re: ip sysctl atomic To: Theo de Raadt Cc: alexander.bluhm@gmx.net, claudio@openbsd.org, tech@openbsd.org Date: Sat, 18 May 2024 20:39:30 +0200 > From: Theo de Raadt > Date: Fri, 17 May 2024 13:33:14 -0600 (MDT) > > I think this is going off the rails. > > I think some fictional "everyone must see the same value" is being > pushed here. > > Can someone please provide an actual real example where this matters? > > I asked before, let me ask again. The man page says: > > Unless explicitly noted below, sysctl() returns a consistent snapshot of > the data requested. Consistency is obtained by locking the destination > buffer into memory so that the data may be copied out without blocking. > Calls to sysctl() are serialized to avoid deadlock. > > I think the last sentence is incompatible with MP, and we should remove it. > Or add a 100 new biglocks, right? That last sentence is a weird implementation detial to mention in a man page. Someone was really proud of what they did it seems. The idea of wiring pages to avoid page faults while copying out large amounts of data isn't necessarily incompatible with MP, but the current implementation does make it rather easy to introduce lock order reversals. It is also entirely unnecessary for sysctls that are a single int or a small struct. > Next sentence > > Consistency is obtained by locking the destination > buffer into memory so that the data may be copied out without blocking. > > That has always been untrue. vslock is an incredibly huge block. It was true back in the days when MP kernel didn't exist. These days you also need a lock to guarantee consistency, like the kernel lock or the net lock. > So really it comes down to: > > Unless explicitly noted below, sysctl() returns a consistent snapshot of > the data requested. > > bluhm is trying to fix requests of one int, or one long. Right, and as long as those are naturally aligned and don't cross a page boundary. The values will always be correct... > Consistant with respect to what?? ...since a single integer is always consistent with itself. > The codepaths he's changing don't sleep. Not 100% sure about that. The sysctl codepaths themselves will only sleep at edges where we grab locks. But the code paths in the network stack may sleep at other points. Anyway, with the locks removed there may be problems in code that usesthe global variables, even without sleep points. > Can someone PLEASE tell me of an int/long sysctl value that will be > read by multiple threads, *while* another thread is changing it (and I mean > one change) -- and where it is OBSERVABLE in the multiple readers that > something wasn't consistant? If we were to remove the sysctl serialization lock (sysctl_lock) I believe it is possible to observe an inconsistency where if the variables starts with A, thread 1 changes it to B and thread 2 simultaniously changes it to C, thread 2 may see "A" as the old value, but the final value may end up being "B". This may not be a problem, but it is inconsistent when one assumes that sysctl does a simple swap operation. But it is relatively easy to fix by using an atomic swap operation. > Some will return a value *before* the write was started (OR after it was > started but did not finish yet), and others will return a value *after* > the write was started and finished. > > What's the problem? > > Please demonstrate a specific sysctl request where this is a real issue, > and if it is, let's go the other way and just reimpose the damn biglock > and tell people OpenBSD is always going to be slow. The real issue with the diff is that by dropping the net lock from the sysctl code paths the global variables that are used to make decisions in the network stack can now change while running code in the network stack that holds the net lock. We must guarantee consistency in those codepaths. If net.inet.ip.forwarding=1 and we go down the code path to forward the packet and further down that code path we check net.inet.ip.forwaring again and the value is now 0, the code may do the wrong thing.