Download raw body.
arm64 LSE support in userland: introduce elf_aux_info?
On 2024/07/12 11:22, Jeremie Courreges-Anglas wrote: > On Thu, Jul 11, 2024 at 09:17:44AM -0400, j@bitminer.ca wrote: > > > 1) you are filtering the truth and providing a BSD-specific set of > > "interpretations" of the truth. > > > > The "truth" is the contents of CPU-capability registers. Why provide > > an opinionated interpretation of these, instead of just readable > > copies? > > There are features we want to hide from userland and some features > that we want to expose. Among the latter, some features may require > kernel support, and exposing those without kernel support would be > wrong. We already saw that on amd64 with AVX512. > [...] > > > Look at the patch suggested by @jca in elf.h for HWCAP2, there is > > only 19 bits left unused. How many years of ARM architecture > > evolution will that last? Three? Four? > > The solution to that problem appears to be: use HWCAP_CPUID to tell > whether you can call cpu-specific cpuid instructions. Those would > have to be emulated and sanitized by the kernel on some architectures. > See Mark's previous mails. Seems a sane approach. On 2024/07/11 09:17, j@bitminer.ca wrote: > Here is a good example, the "blis" high-performance linpack substitute. > They have 1390 lines of code (bli_cpuid.c) to transform amd64 > capabilities, aarch64 capabilities, arm7, and power capabilities > to internally used compile and runtime flags. They already use > getauxval but only as an introduction to aarch64 cpu analysis. The > rest is parsing available CPU capability registers, or parsing > /proc/cpuinfo on Linux. > > "blis" uses every trick in the book and they still get it somewhat > wrong (their aarch64 code is a mess.) I'm working on an OpenBSD > port and I have to parse /var/run/dmesg.boot to get the flags needed > for aarch64. In general we don't need to support every trick for userland to detect every possible cpu feature. Some are definitely useful but having a whole bunch of various codepaths taken at runtime depending on cpu type (especially with the variety available on aarch64) makes debugging hard. In particular you may well find that some of these codepaths break branch-target CFI. This isn't a "performance above everything" OS - I'd argue that reducing the number of variations at runtime (while still supporting some carefully chosen most-useful ones) is actually a benefit for us.
arm64 LSE support in userland: introduce elf_aux_info?