Index | Thread | Search

From:
Mark Kettenis <mark.kettenis@xs4all.nl>
Subject:
arm64 LSE support in userland
To:
tech@openbsd.org
Cc:
brad@comstyle.com
Date:
Thu, 04 Jul 2024 14:50:26 +0200

Download raw body.

Thread
On arm64 we now use the LSE feature to support atomic operations that
scale better on modern systems with many CPU cores.  But only in the
kernel.  Since userland uses atomic operations too, we should make
them available there too.

Linux does this by enabling -moutline-atomics by default.  So I
propose to the same on OpenBSD.  The necessary functions are provided
in libcompiler_rt.  However, these functions rely on runtime detection
of the LSE feature, which is something that doesn't happen on OpenBSD.
So we need to fix this, otherwise we'll always use the LL/SC atomics.

The runtime detection code lives in

  gnu/llvm/compiler-rt/lib/builtins/cpu_model.c

and we could add some code there that uses the various
machdep.id_aa64* sysctls to get the relevant CPU feature ID registers
and set the relevant feature flags.  However, since that involves a
system call I'm wondering whether there are any consequences for
pledge(2).  The feature detection code runs as a constructor, so it
should run before any pledge(2) calls under normal circumstances.  But
shared libraries have their own instance of that constructor so a
dlopen(3) after pledge(2) will fail.

However, with newer ARMv8 and ARMv9 cores making it out there in
products that people can actually buy, we start seeing processor
feature checks popping up in more and more code bases.  And I'm not
sure that making extensive changes to the bits of code that perform
these checks is viable in the long run.  So maybe we need to export
the processor features to userland in a way that is more aligned with
what other OSes do.

Both Linux and FreeBSD do this through AT_HWCAP and AT_HWCAP2
"auxilliary vectors".  They use slightly different interfaces (Linux
has getauxval(3), FreeBSD has elf_aux_info(3) but the same #defines
for the features).  Both also support HWCAP_CPUID, which indicates
that they support access to the (privileged) CPU feature ID registers
from userland.  Implementing that feature makes a lot of sense to me
as on x86 you can execute the CPUID instruction from userland.  Yes,
that means those instructions will trap and have to be emulated in the
kernel.  But that does have the benefit that we can provide sanitized
versions of them that hide certain dangerous features or features for
which we lack the necessary kernel support.

Introducing getauxval(3) carries some risks a its availability might
be autodetected and enable code that assumes other Linux-specific
AT_xxx vectors that we don't implement are available as well.  Another
issue may be that if we introduce AT_HWCAP on arm64 but not on other
platforms this might cause misdetections as well.