Download raw body.
arm64 LSE support in userland
On arm64 we now use the LSE feature to support atomic operations that scale better on modern systems with many CPU cores. But only in the kernel. Since userland uses atomic operations too, we should make them available there too. Linux does this by enabling -moutline-atomics by default. So I propose to the same on OpenBSD. The necessary functions are provided in libcompiler_rt. However, these functions rely on runtime detection of the LSE feature, which is something that doesn't happen on OpenBSD. So we need to fix this, otherwise we'll always use the LL/SC atomics. The runtime detection code lives in gnu/llvm/compiler-rt/lib/builtins/cpu_model.c and we could add some code there that uses the various machdep.id_aa64* sysctls to get the relevant CPU feature ID registers and set the relevant feature flags. However, since that involves a system call I'm wondering whether there are any consequences for pledge(2). The feature detection code runs as a constructor, so it should run before any pledge(2) calls under normal circumstances. But shared libraries have their own instance of that constructor so a dlopen(3) after pledge(2) will fail. However, with newer ARMv8 and ARMv9 cores making it out there in products that people can actually buy, we start seeing processor feature checks popping up in more and more code bases. And I'm not sure that making extensive changes to the bits of code that perform these checks is viable in the long run. So maybe we need to export the processor features to userland in a way that is more aligned with what other OSes do. Both Linux and FreeBSD do this through AT_HWCAP and AT_HWCAP2 "auxilliary vectors". They use slightly different interfaces (Linux has getauxval(3), FreeBSD has elf_aux_info(3) but the same #defines for the features). Both also support HWCAP_CPUID, which indicates that they support access to the (privileged) CPU feature ID registers from userland. Implementing that feature makes a lot of sense to me as on x86 you can execute the CPUID instruction from userland. Yes, that means those instructions will trap and have to be emulated in the kernel. But that does have the benefit that we can provide sanitized versions of them that hide certain dangerous features or features for which we lack the necessary kernel support. Introducing getauxval(3) carries some risks a its availability might be autodetected and enable code that assumes other Linux-specific AT_xxx vectors that we don't implement are available as well. Another issue may be that if we introduce AT_HWCAP on arm64 but not on other platforms this might cause misdetections as well.
arm64 LSE support in userland