From: Mark Kettenis Subject: Re: arm64 LSE support in userland: introduce elf_aux_info? To: Stuart Henderson Cc: j@bitminer.ca, tech@openbsd.org Date: Fri, 12 Jul 2024 12:59:09 +0200 > Date: Fri, 12 Jul 2024 10:57:40 +0100 > From: Stuart Henderson > > On 2024/07/12 11:22, Jeremie Courreges-Anglas wrote: > > On Thu, Jul 11, 2024 at 09:17:44AM -0400, j@bitminer.ca wrote: > > > > > 1) you are filtering the truth and providing a BSD-specific set of > > > "interpretations" of the truth. > > > > > > The "truth" is the contents of CPU-capability registers. Why provide > > > an opinionated interpretation of these, instead of just readable > > > copies? > > > > There are features we want to hide from userland and some features > > that we want to expose. Among the latter, some features may require > > kernel support, and exposing those without kernel support would be > > wrong. > > We already saw that on amd64 with AVX512. Yes, so we won't advertise SVE, SVE2 and SME on OpenBSD for now, even if the hardware supports it. > > > [...] > > > > > Look at the patch suggested by @jca in elf.h for HWCAP2, there is > > > only 19 bits left unused. How many years of ARM architecture > > > evolution will that last? Three? Four? > > > > The solution to that problem appears to be: use HWCAP_CPUID to tell > > whether you can call cpu-specific cpuid instructions. Those would > > have to be emulated and sanitized by the kernel on some architectures. > > See Mark's previous mails. > > Seems a sane approach. > > On 2024/07/11 09:17, j@bitminer.ca wrote: > > Here is a good example, the "blis" high-performance linpack substitute. > > They have 1390 lines of code (bli_cpuid.c) to transform amd64 > > capabilities, aarch64 capabilities, arm7, and power capabilities > > to internally used compile and runtime flags. They already use > > getauxval but only as an introduction to aarch64 cpu analysis. The > > rest is parsing available CPU capability registers, or parsing > > /proc/cpuinfo on Linux. > > > > "blis" uses every trick in the book and they still get it somewhat > > wrong (their aarch64 code is a mess.) I'm working on an OpenBSD > > port and I have to parse /var/run/dmesg.boot to get the flags needed > > for aarch64. > > In general we don't need to support every trick for userland to detect > every possible cpu feature. Some are definitely useful but having a > whole bunch of various codepaths taken at runtime depending on cpu type > (especially with the variety available on aarch64) makes debugging hard. > In particular you may well find that some of these codepaths break > branch-target CFI. This isn't a "performance above everything" OS - I'd > argue that reducing the number of variations at runtime (while still > supporting some carefully chosen most-useful ones) is actually a benefit > for us. Agreed. There are clear benefits to exposing LSE. But I'm not sure there are any real benefits in supporting things like SM4 crypto at this moment.