From: George Koehler Subject: powerpc64: isync isn't a memory barrier To: tech@openbsd.org Date: Wed, 12 Jun 2024 00:00:46 -0400 On powerpc64, I want to stop using "isync" as a memory barrier. This diff uses "sync". I explain below. Index: arch/powerpc64/include/atomic.h =================================================================== RCS file: /cvs/src/sys/arch/powerpc64/include/atomic.h,v diff -u -p -r1.3 atomic.h --- arch/powerpc64/include/atomic.h 29 Aug 2022 02:01:18 -0000 1.3 +++ arch/powerpc64/include/atomic.h 10 Jun 2024 19:53:48 -0000 @@ -276,10 +276,10 @@ _atomic_addic_long_nv(volatile unsigned #define __membar(_f) do { __asm volatile(_f ::: "memory"); } while (0) #if defined(MULTIPROCESSOR) || !defined(_KERNEL) -#define membar_enter() __membar("isync") +#define membar_enter() __membar("sync") #define membar_exit() __membar("sync") #define membar_producer() __membar("sync") -#define membar_consumer() __membar("isync") +#define membar_consumer() __membar("sync") #define membar_sync() __membar("sync") #else #define membar_enter() __membar("") "isync" is an instruction barrier. PowerPC Rev. 1 said, | Executing an *isync* instruction ensures that all instructions | preceding the *isync* instruction have completed before the *isync* | instruction completes, except that memory accesses caused by those | instructions need not have been performed with respect to other | processors and mechanisms. So "isync" is no barrier for memory accesses (loads or stores). As long as our membar_consumer(9) is "isync", it might fail to order the preceding loads before the following loads. This "isync" in /sys/kern/kern_sig.c cursig() looks wrong, ps_siglist = READ_ONCE(pr->ps_siglist); membar_consumer(); mask = SIGPENDING(p); The intent might be to load ps_siglist before mask, but powerpc64 membar_consumer does "isync", might load them in the wrong order. Mac OS on macppc had used "isync" as a load barrier. The (Mac G4) MPC7450 manual said, | The *isync* instruction is not executed until all previous | instructions complete to the point where they cannot cause an | exception. The *isync* instruction does not wait for all pending | stores in the store queue to complete. Because it mentions only stores, a reader might believe that *isync* does complete loads. Also, the (Mac G5) IBM970FX manual said, | In addition, the *isync* instruction is often used as a _load | barrier_ to prevent any subsequent load (or store) instructions | from executing before previous load instruction have been | completed. I suspect that "isync" is not a load barrier on newer powerpc64 processors like POWER9. One can still use "isync" to order a conditional branch before a memory access (as Power ISA 3.0 B, Book II, B.2.1.1 uses "isync" after acquiring a lock). Most of our membar_enter calls and some of our membar_consume calls are after a conditional branch. My 4-core POWER9 observed no change with this diff. An 8-core POWER9 in bugs@ "powerpc64/pmap.c trouble report" was running this diff, but I don't know whether the diff helped, or the kernel got lucky. This diff uses "sync" to order loads and stores. Later, I might propose a 2nd diff to start using "lwsync" on powerpc64. I wouldn't use "lwsync" on macppc. --gkoehler