Download raw body.
powerpc64: isync isn't a memory barrier
On powerpc64, I want to stop using "isync" as a memory barrier.
This diff uses "sync". I explain below.
Index: arch/powerpc64/include/atomic.h
===================================================================
RCS file: /cvs/src/sys/arch/powerpc64/include/atomic.h,v
diff -u -p -r1.3 atomic.h
--- arch/powerpc64/include/atomic.h 29 Aug 2022 02:01:18 -0000 1.3
+++ arch/powerpc64/include/atomic.h 10 Jun 2024 19:53:48 -0000
@@ -276,10 +276,10 @@ _atomic_addic_long_nv(volatile unsigned
#define __membar(_f) do { __asm volatile(_f ::: "memory"); } while (0)
#if defined(MULTIPROCESSOR) || !defined(_KERNEL)
-#define membar_enter() __membar("isync")
+#define membar_enter() __membar("sync")
#define membar_exit() __membar("sync")
#define membar_producer() __membar("sync")
-#define membar_consumer() __membar("isync")
+#define membar_consumer() __membar("sync")
#define membar_sync() __membar("sync")
#else
#define membar_enter() __membar("")
"isync" is an instruction barrier. PowerPC Rev. 1 said,
| Executing an *isync* instruction ensures that all instructions
| preceding the *isync* instruction have completed before the *isync*
| instruction completes, except that memory accesses caused by those
| instructions need not have been performed with respect to other
| processors and mechanisms.
So "isync" is no barrier for memory accesses (loads or stores). As
long as our membar_consumer(9) is "isync", it might fail to order the
preceding loads before the following loads. This "isync" in
/sys/kern/kern_sig.c cursig() looks wrong,
ps_siglist = READ_ONCE(pr->ps_siglist);
membar_consumer();
mask = SIGPENDING(p);
The intent might be to load ps_siglist before mask, but powerpc64
membar_consumer does "isync", might load them in the wrong order.
Mac OS on macppc had used "isync" as a load barrier. The (Mac G4)
MPC7450 manual said,
| The *isync* instruction is not executed until all previous
| instructions complete to the point where they cannot cause an
| exception. The *isync* instruction does not wait for all pending
| stores in the store queue to complete.
Because it mentions only stores, a reader might believe that *isync*
does complete loads. Also, the (Mac G5) IBM970FX manual said,
| In addition, the *isync* instruction is often used as a _load
| barrier_ to prevent any subsequent load (or store) instructions
| from executing before previous load instruction have been
| completed.
I suspect that "isync" is not a load barrier on newer powerpc64
processors like POWER9. One can still use "isync" to order a
conditional branch before a memory access (as Power ISA 3.0 B, Book
II, B.2.1.1 uses "isync" after acquiring a lock). Most of our
membar_enter calls and some of our membar_consume calls are after a
conditional branch.
My 4-core POWER9 observed no change with this diff. An 8-core POWER9
in bugs@ "powerpc64/pmap.c trouble report" was running this diff, but
I don't know whether the diff helped, or the kernel got lucky.
This diff uses "sync" to order loads and stores. Later, I might
propose a 2nd diff to start using "lwsync" on powerpc64. I wouldn't
use "lwsync" on macppc.
--gkoehler
powerpc64: isync isn't a memory barrier