From: "Theo de Raadt" Subject: arm64 -fret-clean attempt To: tech@cvs.openbsd.org Date: Mon, 01 Jul 2024 23:50:45 -0600 I've been trying to write -fret-clean for arm64. On a return-stack architecture like amd64, the callee has to clean up the word on the stack upon return. arm64, like some other risc architectures, is a link-register architecture. In this case, the return address is saved in some temporary location by the caller, who loads it into the link register before returning. Before that moment, the caller has to clean it up. After running around in the swamp for a while, I came up with this variation that inserts the correct instruction at the correct place. It works, as long as the optimizer is turned off with -O0. But as soon as the optimizer is turned on, it appears that AArch64LoadStoreOpt::mergeUpdateInsn gets confused and re-arranges the instruction incorrectly. I think it has an assumption that the regions is encounters will only be either prologue (containing only stores to the stack), or epilogue (containing only restores, meaning loads, from the stack), and it doesn't understand that I've added a store inbetween these loads and it merges them incorrectly. I've been unable to figure out how to solve this. Anyways, I'm throwing this incomplete code out there in case someone else wants to take a shot at it. Thanks. Index: gnu/llvm/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp =================================================================== RCS file: /cvs/src/gnu/llvm/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp,v diff -u -p -u -r1.4 AArch64FrameLowering.cpp --- gnu/llvm/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp 11 Nov 2023 18:14:31 -0000 1.4 +++ gnu/llvm/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp 10 Jun 2024 04:57:01 -0000 @@ -240,6 +240,10 @@ static cl::opt EnableRedZone("aarc cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden); +static cl::opt EnableRetClean("aarch64-ret-clean", + cl::desc("enable use of ret-clean on AArch64"), + cl::init(false), cl::Hidden); + static cl::opt ReverseCSRRestoreSeq("reverse-csr-restore-seq", cl::desc("reverse the CSR restore sequence"), @@ -2924,6 +2928,18 @@ bool AArch64FrameLowering::restoreCallee MachineMemOperand::MOLoad, Size, Alignment)); if (NeedsWinCFI) InsertSEH(MIB, TII, MachineInstr::FrameDestroy); + + if (EnableRetClean && !MF.exposesReturnsTwice() && Reg1 == AArch64::LR) { + printf("idx %d %d\n", FrameIdxReg1, FrameIdxReg2); + BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui)) + .addReg(AArch64::XZR) + .addReg(AArch64::SP) + .addImm(RPI.Offset) + .addMemOperand(MF.getMachineMemOperand( + MachinePointerInfo::getFixedStack(MF, FrameIdxReg1), + MachineMemOperand::MOStore, 8, Align(8))) + .setMIFlag(MachineInstr::FrameDestroy); + } return MIB->getIterator(); }; Index: gnu/llvm/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp =================================================================== RCS file: /cvs/src/gnu/llvm/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp,v diff -u -p -u -r1.1.1.4 AArch64LoadStoreOptimizer.cpp --- gnu/llvm/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp 11 Nov 2023 17:59:09 -0000 1.1.1.4 +++ gnu/llvm/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp 10 Jun 2024 05:17:50 -0000 @@ -1808,6 +1808,8 @@ AArch64LoadStoreOpt::mergeUpdateInsn(Mac if (Update->getOpcode() == AArch64::SUBXri) Value = -Value; +// XXX I think it merges my (new) STORE 0 instruction with other LOADS here + unsigned NewOpc = IsPreIdx ? getPreIndexedOpcode(I->getOpcode()) : getPostIndexedOpcode(I->getOpcode()); MachineInstrBuilder MIB;