From: j@bitminer.ca
Subject: Re: arm64 LSE support in userland: introduce elf_aux_info?
To: Mark Kettenis <mark.kettenis@xs4all.nl>
Cc: Stuart Henderson <stu@spacehopper.org>, tech@openbsd.org
Date: Fri, 12 Jul 2024 16:08:10 -0400

On 2024-07-12 06:59, Mark Kettenis wrote:
>> Date: Fri, 12 Jul 2024 10:57:40 +0100
>> From: Stuart Henderson <stu@spacehopper.org>
>> 
>> On 2024/07/12 11:22, Jeremie Courreges-Anglas wrote:
>> > On Thu, Jul 11, 2024 at 09:17:44AM -0400, j@bitminer.ca wrote:
>> >
>> > > 1) you are filtering the truth and providing a BSD-specific set of
>> > > "interpretations" of the truth.
>> > >
>> > > The "truth" is the contents of CPU-capability registers.  Why provide
>> > > an opinionated interpretation of these, instead of just readable
>> > > copies?
>> >
>> > There are features we want to hide from userland and some features
>> > that we want to expose.  Among the latter, some features may require
>> > kernel support, and exposing those without kernel support would be
>> > wrong.
>> 
>> We already saw that on amd64 with AVX512.

Some features are fairly complicated to understand; e.g. the array
of avx instructions varies dramatically with implementation (code
name of the cpu).  Fortunately cpuid on Intel is unprivileged so
software can decide; if avx (or a future innovation) isn't supported
then the porter has to understand this and somehow restrict usage.
This is another thing porters have to do aside from library
dependencies, code patches, etc.  Presumably other operating systems
have or have had similar restrictions.

For example on aarch64 there is still bf16 or neon/fp16 neon/no-fp16
features which I assume OpenBSD kernel is agnostic to (neon is
always present on aarch64 v8 and above.)  A port may want to
dynamically choose between fp16, arm half-float, or bf16, if present.
And therefore the availability of these should be visible.

If SVE is not available (because context switch does not preserve
the register file) then so be it.  Again the porter has to control
for this.

Whichever solution is used (emulated cpuid instructions, or
HWCAP_CPUID, or printing flags in dmesg for software to parse, or
sysctl) the porters will use whatever is most convenient and easy
to understand. Which typically means a slight adjustment on the
Linux method.  It is still a hard slog on both amd64 and aarch64
especially if the code is weakly documented (such as blis, py3-numpy,
highway).

Providing these multiple methods eases the job of porters.

To be sure, I would support a dmesg output which matches, within
the supported features, the Linux file /proc/cpuinfo for feature
flags.

...snip...

>> In particular you may well find that some of these codepaths break
>> branch-target CFI.

Interesting risk there.  is there somewhere to learn more?

>> This isn't a "performance above everything" OS - I'd
>> argue that reducing the number of variations at runtime (while still
>> supporting some carefully chosen most-useful ones) is actually a 
>> benefit
>> for us.
> 
> Agreed.  There are clear benefits to exposing LSE.  But I'm not sure
> there are any real benefits in supporting things like SM4 crypto at
> this moment.

If security features are "complete" (for today's version of "complete") 
then
it's nice to be able to allow ports to achieve their best performance
possible.


J

PS.  Someone asked for diffs.  I would delay offering these until I
understand where the opportunities lie for these.