From: j@bitminer.ca Subject: Re: arm64 LSE support in userland: introduce elf_aux_info? To: Mark Kettenis Cc: Stuart Henderson , tech@openbsd.org Date: Fri, 12 Jul 2024 16:08:10 -0400 On 2024-07-12 06:59, Mark Kettenis wrote: >> Date: Fri, 12 Jul 2024 10:57:40 +0100 >> From: Stuart Henderson >> >> On 2024/07/12 11:22, Jeremie Courreges-Anglas wrote: >> > On Thu, Jul 11, 2024 at 09:17:44AM -0400, j@bitminer.ca wrote: >> > >> > > 1) you are filtering the truth and providing a BSD-specific set of >> > > "interpretations" of the truth. >> > > >> > > The "truth" is the contents of CPU-capability registers. Why provide >> > > an opinionated interpretation of these, instead of just readable >> > > copies? >> > >> > There are features we want to hide from userland and some features >> > that we want to expose. Among the latter, some features may require >> > kernel support, and exposing those without kernel support would be >> > wrong. >> >> We already saw that on amd64 with AVX512. Some features are fairly complicated to understand; e.g. the array of avx instructions varies dramatically with implementation (code name of the cpu). Fortunately cpuid on Intel is unprivileged so software can decide; if avx (or a future innovation) isn't supported then the porter has to understand this and somehow restrict usage. This is another thing porters have to do aside from library dependencies, code patches, etc. Presumably other operating systems have or have had similar restrictions. For example on aarch64 there is still bf16 or neon/fp16 neon/no-fp16 features which I assume OpenBSD kernel is agnostic to (neon is always present on aarch64 v8 and above.) A port may want to dynamically choose between fp16, arm half-float, or bf16, if present. And therefore the availability of these should be visible. If SVE is not available (because context switch does not preserve the register file) then so be it. Again the porter has to control for this. Whichever solution is used (emulated cpuid instructions, or HWCAP_CPUID, or printing flags in dmesg for software to parse, or sysctl) the porters will use whatever is most convenient and easy to understand. Which typically means a slight adjustment on the Linux method. It is still a hard slog on both amd64 and aarch64 especially if the code is weakly documented (such as blis, py3-numpy, highway). Providing these multiple methods eases the job of porters. To be sure, I would support a dmesg output which matches, within the supported features, the Linux file /proc/cpuinfo for feature flags. ...snip... >> In particular you may well find that some of these codepaths break >> branch-target CFI. Interesting risk there. is there somewhere to learn more? >> This isn't a "performance above everything" OS - I'd >> argue that reducing the number of variations at runtime (while still >> supporting some carefully chosen most-useful ones) is actually a >> benefit >> for us. > > Agreed. There are clear benefits to exposing LSE. But I'm not sure > there are any real benefits in supporting things like SM4 crypto at > this moment. If security features are "complete" (for today's version of "complete") then it's nice to be able to allow ports to achieve their best performance possible. J PS. Someone asked for diffs. I would delay offering these until I understand where the opportunities lie for these.