Index | Thread | Search

From:
Mark Kettenis <mark.kettenis@xs4all.nl>
Subject:
Re: arm64 LSE support in userland: introduce elf_aux_info?
To:
Stuart Henderson <stu@spacehopper.org>
Cc:
j@bitminer.ca, tech@openbsd.org
Date:
Fri, 12 Jul 2024 12:59:09 +0200

Download raw body.

Thread
> Date: Fri, 12 Jul 2024 10:57:40 +0100
> From: Stuart Henderson <stu@spacehopper.org>
> 
> On 2024/07/12 11:22, Jeremie Courreges-Anglas wrote:
> > On Thu, Jul 11, 2024 at 09:17:44AM -0400, j@bitminer.ca wrote:
> > 
> > > 1) you are filtering the truth and providing a BSD-specific set of
> > > "interpretations" of the truth.
> > > 
> > > The "truth" is the contents of CPU-capability registers.  Why provide
> > > an opinionated interpretation of these, instead of just readable
> > > copies?
> > 
> > There are features we want to hide from userland and some features
> > that we want to expose.  Among the latter, some features may require
> > kernel support, and exposing those without kernel support would be
> > wrong.
> 
> We already saw that on amd64 with AVX512.

Yes, so we won't advertise SVE, SVE2 and SME on OpenBSD for now, even
if the hardware supports it.

> 
> > [...]
> > 
> > > Look at the patch suggested by @jca in elf.h for HWCAP2, there is
> > > only 19 bits left unused.  How many years of ARM architecture
> > > evolution will that last?  Three?  Four?
> > 
> > The solution to that problem appears to be: use HWCAP_CPUID to tell
> > whether you can call cpu-specific cpuid instructions.  Those would
> > have to be emulated and sanitized by the kernel on some architectures.
> > See Mark's previous mails.
> 
> Seems a sane approach.
> 
> On 2024/07/11 09:17, j@bitminer.ca wrote:
> > Here is a good example, the "blis" high-performance linpack substitute.
> > They have 1390 lines of code (bli_cpuid.c) to transform amd64
> > capabilities, aarch64 capabilities, arm7, and power capabilities
> > to internally used compile and runtime flags.  They already use
> > getauxval but only as an introduction to aarch64 cpu analysis.  The
> > rest is parsing available CPU capability registers, or parsing
> > /proc/cpuinfo on Linux.
> >
> > "blis" uses every trick in the book and they still get it somewhat
> > wrong (their aarch64 code is a mess.)  I'm working on an OpenBSD
> > port and I have to parse /var/run/dmesg.boot to get the flags needed
> > for aarch64.
> 
> In general we don't need to support every trick for userland to detect
> every possible cpu feature. Some are definitely useful but having a
> whole bunch of various codepaths taken at runtime depending on cpu type
> (especially with the variety available on aarch64) makes debugging hard.
> In particular you may well find that some of these codepaths break
> branch-target CFI. This isn't a "performance above everything" OS - I'd
> argue that reducing the number of variations at runtime (while still
> supporting some carefully chosen most-useful ones) is actually a benefit
> for us.

Agreed.  There are clear benefits to exposing LSE.  But I'm not sure
there are any real benefits in supporting things like SM4 crypto at
this moment.