From: Visa Hankala Subject: Re: smr memory barrier To: Alexander Bluhm Cc: tech@openbsd.org Date: Tue, 11 Feb 2025 16:27:07 +0000 On Mon, Feb 10, 2025 at 09:57:29PM +0100, Alexander Bluhm wrote: > Hi, > > Usually membar producer and consumer should be symmetric. > > If the CPU that runs SMR_PTR_SET_LOCKED() wants to make sure that > other CPUs see the changes of the object before they see the new > pointer, it needs a membar_producer(). > > But then the other CPUs must call membar_consumer() to read the > data, that had been modified before the new pointer was published. > > If this is the intended semantics of SMR, we must add a membar_consumer() > to SMR_PTR_GET(). Otherwise, if the macros don't care and expect > the caller to insert barriers, the membar_producer() should be > removed from SMR_PTR_SET_LOCKED(). I think it is less error prone > if the SMR_PTR_GET() and SMR_PTR_SET_LOCKED() handle both. As Claudio has already noted, the current memory barriers in the SMR_PTR_*() macros should be enough against unwanted reordering done by CPUs. The explicit memory barrier in SMR_PTR_SET_LOCKED() ensures that the value (actually any writes before the macro) becomes visible system-wide before the pointer gets written. The data dependency with the pointer load in SMR_PTR_GET() is typically sufficient to prevent any unwanted reordering; the CPU has to know the pointer first before it can start accessing the data behind that pointer. I suppose a very speculative CPU could make a guess of the pointer and issue dependent loads speculatively. However, I don't know if this would make actual sense. Just from a statistical point of view, the guessed pointer would likely be wrong and the speculative loads would have to be discarded often. However, seeming reordering of dependent loads can occur in some processors, namely certain models of Alpha. The affected Alpha CPUs have their caches split in two banks that have their own connections to the system interconnect and operate independently most of the time. If the pointer and the dependent data fall on different cache banks, there is a slight chance that the updating of the dependent data's cache bank is behind the pointer's cache bank. Then the dependent load can return an old value. This very special case should be covered to a sufficient degree by the membar_datadep_consumer() barrier in the READ_ONCE() macro. In practice, modern processors and their memory subsystems should never reorder dependent loads. At least this is the assumption in Linux. > Index: sys/smr.h > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/sys/smr.h,v > diff -u -p -U7 -r1.9 smr.h > --- sys/smr.h 25 Jul 2022 08:06:44 -0000 1.9 > +++ sys/smr.h 10 Feb 2025 20:47:30 -0000 > @@ -65,15 +65,19 @@ smr_init(struct smr_entry *smr) > #else > #define SMR_ASSERT_CRITICAL() do {} while (0) > #define SMR_ASSERT_NONCRITICAL() do {} while (0) > #endif > > #endif /* _KERNEL */ > > -#define SMR_PTR_GET(pptr) READ_ONCE(*pptr) > +#define SMR_PTR_GET(pptr) ({ \ > + typeof(*pptr) __ptr = *(volatile typeof(*pptr) *)(pptr); \ > + membar_consumer(); \ > + __ptr; \ > +}) > > #define SMR_PTR_GET_LOCKED(pptr) (*(pptr)) > > #define SMR_PTR_SET_LOCKED(pptr, val) do { \ > membar_producer(); \ > WRITE_ONCE(*pptr, val); \ > } while (0) >