From: Martin Pieuchot Subject: Re: [PATCH] convert mpl ticket lock to anderson's lock To: Mateusz Guzik , tech@openbsd.org Date: Wed, 18 Feb 2026 09:08:02 +0100 On 17/02/26(Tue) 13:27, Theo de Raadt wrote: > Claudio Jeker wrote: > > > On Tue, Feb 17, 2026 at 12:25:16PM -0700, Theo de Raadt wrote: > > > Theo de Raadt wrote: > > > > > > > > Per that post, the primary problem concerns page allocation and the > > > > > way mutexes are implemented > > > > > > > > While looking at things in the pagedaemon, I've become concerned > > > > about how uvm_lock_pageq() is used > > > > > > > > Look at how uvm_pageactivate() and uvm_pagedeactivate() must be called > > > > unlocked > > > > > > > > So locked-callers unlock temporarily to call them, resulting in > > > > very strange unlock/lock/tiny-operation/unlock/lock sequences. > > > > > > > > Without the locks the tiny-operation would be instantenous but this > > > > design means there is a tremendous bubble if someone else wanted the lock. > > > > > > > > I think these low-level changes in locking and mutexes will have little > > > > effect if we are using them extremely poorly, which we are. > > > > > > This is why. > > > > > > revision 1.184 > > > date: 2025/12/10 08:38:18; author: mpi; state: Exp; lines: +44 -37; commitid: ACX7PGcPcpfRHo20; > > > Push `pageqlock' dances inside uvm_page{de,}activate() & uvm_pagewire(). > > > > > > Tested during multiple bulks on amd64, i386, arm64 and sparc64 by jca@, > > > phessler@ and sthen@. > > > > > > > > > > > > This pushing of the lock into the lower level functions, in turn requires that > > > many higher level callers have to unlock, in order to a function which now locks, > > > and upon return, relock for their own needs. > > > > > > I don't think think that reduces contention but only hides it. But it > > > definately increases latency for the specific caller because someone else > > > wins the mutex during the two release windows. Parallelism may have increased, > > > but latency of critical functions calling these will suffer. Some of these > > > critical functions if they did't do this dance, would be finished quicker and > > > the system would be less contended. > > > > > > To me this looks like deck chairs succesfully rearranged. > > > > > > > I actually think this diff introduced some locking issues where objects > > get modified in multiple steps instead of atomically. If two cpus run in > > lock step the result could be a corrupt object. > > I agree.> > uvm_unlock_pageq() is explicitly defined to just lock the queue. > However, in many cases the calling code also implicitly uses it to > provide atomicity for some operations upon the page being handled, > because all similar atomicity requiring sequences also occur inside > uvm_unlock_pageq() > > But when this lock is divided into 2-3 lock/unlock sequences, this > introduces the ability for a shared mapping on the same page or object > in a different context to interact upon the page in ways which were > previously blocked. Those code sequences only occur inside > uvm_unlock_pageq() locked regions, which is why it implicitly survives. All context first try to grab the owner lock of a page then check for the BUSY flag. That is what prevent the fault handler or any other part of the VM subsystem to mess with the page. > This appears to be the design of the uvm layer, and without > restructuring to avoid these bugs first, lock movement of this type is > extremely dangerous. I agree. My pending diff are bug fixes. Differing them would not help us.