From: Mark Kettenis Subject: Re: powerpc64: isync isn't a memory barrier To: George Koehler Cc: tech@openbsd.org Date: Wed, 12 Jun 2024 22:20:53 +0200 > Date: Wed, 12 Jun 2024 00:00:46 -0400 > From: George Koehler > > On powerpc64, I want to stop using "isync" as a memory barrier. > This diff uses "sync". I explain below. > > Index: arch/powerpc64/include/atomic.h > =================================================================== > RCS file: /cvs/src/sys/arch/powerpc64/include/atomic.h,v > diff -u -p -r1.3 atomic.h > --- arch/powerpc64/include/atomic.h 29 Aug 2022 02:01:18 -0000 1.3 > +++ arch/powerpc64/include/atomic.h 10 Jun 2024 19:53:48 -0000 > @@ -276,10 +276,10 @@ _atomic_addic_long_nv(volatile unsigned > #define __membar(_f) do { __asm volatile(_f ::: "memory"); } while (0) > > #if defined(MULTIPROCESSOR) || !defined(_KERNEL) > -#define membar_enter() __membar("isync") > +#define membar_enter() __membar("sync") > #define membar_exit() __membar("sync") > #define membar_producer() __membar("sync") > -#define membar_consumer() __membar("isync") > +#define membar_consumer() __membar("sync") > #define membar_sync() __membar("sync") > #else > #define membar_enter() __membar("") > > "isync" is an instruction barrier. PowerPC Rev. 1 said, > > | Executing an *isync* instruction ensures that all instructions > | preceding the *isync* instruction have completed before the *isync* > | instruction completes, except that memory accesses caused by those > | instructions need not have been performed with respect to other > | processors and mechanisms. > > So "isync" is no barrier for memory accesses (loads or stores). As > long as our membar_consumer(9) is "isync", it might fail to order the > preceding loads before the following loads. This "isync" in > /sys/kern/kern_sig.c cursig() looks wrong, > > ps_siglist = READ_ONCE(pr->ps_siglist); > membar_consumer(); > mask = SIGPENDING(p); > > The intent might be to load ps_siglist before mask, but powerpc64 > membar_consumer does "isync", might load them in the wrong order. > > Mac OS on macppc had used "isync" as a load barrier. The (Mac G4) > MPC7450 manual said, > > | The *isync* instruction is not executed until all previous > | instructions complete to the point where they cannot cause an > | exception. The *isync* instruction does not wait for all pending > | stores in the store queue to complete. > > Because it mentions only stores, a reader might believe that *isync* > does complete loads. Also, the (Mac G5) IBM970FX manual said, > > | In addition, the *isync* instruction is often used as a _load > | barrier_ to prevent any subsequent load (or store) instructions > | from executing before previous load instruction have been > | completed. > > I suspect that "isync" is not a load barrier on newer powerpc64 > processors like POWER9. One can still use "isync" to order a > conditional branch before a memory access (as Power ISA 3.0 B, Book > II, B.2.1.1 uses "isync" after acquiring a lock). Most of our > membar_enter calls and some of our membar_consume calls are after a > conditional branch. > > My 4-core POWER9 observed no change with this diff. An 8-core POWER9 > in bugs@ "powerpc64/pmap.c trouble report" was running this diff, but > I don't know whether the diff helped, or the kernel got lucky. > > This diff uses "sync" to order loads and stores. Later, I might > propose a 2nd diff to start using "lwsync" on powerpc64. I wouldn't > use "lwsync" on macppc. Linux doesn't seem to use "isync" as a barrier, so this is probably a good idea.