Index | Thread | Search

From:
"Theo de Raadt" <deraadt@openbsd.org>
Subject:
arm64 -fret-clean attempt
To:
tech@cvs.openbsd.org
Date:
Mon, 01 Jul 2024 23:50:45 -0600

Download raw body.

Thread
I've been trying to write -fret-clean for arm64.

On a return-stack architecture like amd64, the callee has to clean up the
word on the stack upon return.

arm64, like some other risc architectures, is a link-register architecture.
In this case, the return address is saved in some temporary location by
the caller, who loads it into the link register before returning.  Before
that moment, the caller has to clean it up.

After running around in the swamp for a while, I came up with this variation
that inserts the correct instruction at the correct place.  It works, as long
as the optimizer is turned off with -O0.

But as soon as the optimizer is turned on, it appears that
AArch64LoadStoreOpt::mergeUpdateInsn gets confused and re-arranges the
instruction incorrectly.  I think it has an assumption that the regions
is encounters will only be either prologue (containing only stores to
the stack), or epilogue (containing only restores, meaning loads, from
the stack), and it doesn't understand that I've added a store inbetween
these loads and it merges them incorrectly.  I've been unable to figure
out how to solve this.

Anyways, I'm throwing this incomplete code out there in case someone
else wants to take a shot at it.

Thanks.

Index: gnu/llvm/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
===================================================================
RCS file: /cvs/src/gnu/llvm/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp,v
diff -u -p -u -r1.4 AArch64FrameLowering.cpp
--- gnu/llvm/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp	11 Nov 2023 18:14:31 -0000	1.4
+++ gnu/llvm/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp	10 Jun 2024 04:57:01 -0000
@@ -240,6 +240,10 @@ static cl::opt<bool> EnableRedZone("aarc
                                    cl::desc("enable use of redzone on AArch64"),
                                    cl::init(false), cl::Hidden);
 
+static cl::opt<bool> EnableRetClean("aarch64-ret-clean",
+                                   cl::desc("enable use of ret-clean on AArch64"),
+                                   cl::init(false), cl::Hidden);
+
 static cl::opt<bool>
     ReverseCSRRestoreSeq("reverse-csr-restore-seq",
                          cl::desc("reverse the CSR restore sequence"),
@@ -2924,6 +2928,18 @@ bool AArch64FrameLowering::restoreCallee
         MachineMemOperand::MOLoad, Size, Alignment));
     if (NeedsWinCFI)
       InsertSEH(MIB, TII, MachineInstr::FrameDestroy);
+
+    if (EnableRetClean && !MF.exposesReturnsTwice() && Reg1 == AArch64::LR) {
+      printf("idx %d %d\n", FrameIdxReg1, FrameIdxReg2);
+      BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
+          .addReg(AArch64::XZR)
+          .addReg(AArch64::SP)
+          .addImm(RPI.Offset)
+          .addMemOperand(MF.getMachineMemOperand(
+              MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
+              MachineMemOperand::MOStore, 8, Align(8)))
+          .setMIFlag(MachineInstr::FrameDestroy);
+    }
 
     return MIB->getIterator();
   };
Index: gnu/llvm/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
===================================================================
RCS file: /cvs/src/gnu/llvm/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp,v
diff -u -p -u -r1.1.1.4 AArch64LoadStoreOptimizer.cpp
--- gnu/llvm/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp	11 Nov 2023 17:59:09 -0000	1.1.1.4
+++ gnu/llvm/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp	10 Jun 2024 05:17:50 -0000
@@ -1808,6 +1808,8 @@ AArch64LoadStoreOpt::mergeUpdateInsn(Mac
   if (Update->getOpcode() == AArch64::SUBXri)
     Value = -Value;
 
+// XXX I think it merges my (new) STORE 0 instruction with other LOADS here
+
   unsigned NewOpc = IsPreIdx ? getPreIndexedOpcode(I->getOpcode())
                              : getPostIndexedOpcode(I->getOpcode());
   MachineInstrBuilder MIB;