From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
Subject: Re: Emulate CPU ID register access on arm64
To: Mark Kettenis <mark.kettenis@xs4all.nl>
Cc: patrick@openbsd.org, tech@openbsd.org, brad@comstyle.com, naddy@openbsd.org
Date: Tue, 23 Jul 2024 00:44:11 +0200

On Mon, Jul 22, 2024 at 08:30:58PM +0200, Mark Kettenis wrote:
> > Date: Sun, 21 Jul 2024 21:41:47 +0200
> > From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
> > 
> > On Wed, Jul 17, 2024 at 10:40:57PM +0200, Mark Kettenis wrote:
> > > > Date: Sun, 14 Jul 2024 12:52:36 +0200
> > > > From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
> > > 
> > > The code rearrange bit is in, so that leaves the emulation bits.
> > > 
> > > > On Sun, Jul 14, 2024 at 11:25:27AM +0200, Mark Kettenis wrote:
> > > > > As mentioned in the HWCAP discussion, new arm64 bits are typically no
> > > > > longer assigned and instead the HWCAP_CPUID bit is used to signal that
> > > > > the kernel emulates access to the CPU ID registers.  The diff below
> > > > > implements this.
> > > > > 
> > > > > The architecture is clearly designed to make emulattion possible and
> > > > > easy.  There is a special trap for MSR access and it provides all the
> > > > > detailt needed to emulate access without the need to read the
> > > > > instruction.  The trapping from EL0 is probably done to let the OS
> > > > > sanitize the values such that only features relevant to userland and
> > > > > common to all CPU cores in the system are advertised.
> > > > > 
> > > > > There are some open questions though.  The diff is rather strict in
> > > > > what access it emulates.  For now this is only the "known" ID_AA64_xxx
> > > > > registers, that is the ID_AA64_xxx registers that are currently
> > > > > defined by the architectures.  I think the architecture actually says
> > > > > that the currently undefined ID_AA64_xxx registers should return zero,
> > > > > and Linux allows access to them all.  But I would like to know when
> > > > > userland processes actually start accessing those, so for now we'll
> > > > > SIGILL.
> > > > 
> > > > ack
> > > > 
> > > > > I didn't implement emulation for the "32-bit" ID_xxx registers.  We
> > > > > don't support excuting 32-bit processes so these registers should be
> > > > > irrelevant.  But they can be accessed using arm64 instructions, so for
> > > > > now we'll continue to SIGILL these as well.
> > > > 
> > > > No idea how often those registers might be used.
> > > > 
> > > > > Then ther are the MIDR_EL1 and MPIDR_EL1 registers.  Linux allows
> > > > > access to these registers.  The problem with these is that their
> > > > > values may depend on what CPU a process is executed on.  So in general
> > > > > I don't think userland code should look at these.  However, some
> > > > > userland code looks at these since certain optimizations might apply
> > > > > only to specific CPU implementations.  On amd64, where the CPUID
> > > > > instruction is available to userland, looking at the CPU familiy and
> > > > > model bits is accepted practice.
> > > > 
> > > > Looks like killing processes for (at least) MIDR_EL1 access would lead
> > > > to breakage in a bunch of ports.
> > > > 
> > > > For example there is devel/abseil-cpp and the copies of it bundled
> > > > into other ports.  The detection is currently done through
> > > > getauxval(3) but letting this SIGILL would be an inconvenient trap for
> > > > anyone adding detection through elf_aux_info(3).
> > > 
> > > Ok.  So that does a microarchitectural optimization of the CRC
> > > calculation.  Looks like it only does that for hardware that Google
> > > cares about in production though.  So they avoid the issue with having
> > > different cores integrated on a single SoC.
> > > 
> > > So I've implemented MIDR_EL1, MPIDR_EL1 and REVIDR_EL1 in the same way
> > > as Linux, passing through MIDR)EL1 (the core type) and faking the
> > > other registers.
> > 
> > Cool.  The midr/revidr_el1 behavior matches what's documented at
> > https://docs.kernel.org/arch/arm64/cpu-feature-registers.html
> > 
> > > > https://codesearch.debian.net/search?q=%26.*HWCAP_CPUID&literal=0
> > > > 
> > > > Granted, as long as we don't advertize HWCAP_CPUID your proposal is
> > > > still more lenient/useful than our current policy of always have
> > > > userland trap.
> > > > 
> > > > > Thoughts?
> > 
> > The emulation code looks good to me according to ARM DDI 0487K.a
> > D22.3.  I can't test this currently, ok jca@ FWIW.
> 
> As naddy@ found out, it actually isn't ok.  The exception that helps
> emulation of the MSR/MRS instructions is only available if FEAT_IDST
> is implemented.  That's an (mandatory) ARMv8.4 feature, so older CPU
> cores don't support it.  On those older CPUs the instructions generate
> a standard "unknown instruction" trap.

sigh, that looked too good to be true.

> We could implement support for emulating these instructions "the hard
> way", by reading the instruction and decoding it.  However, since
> arm64 implements X-only, using copyin(9) to read the instruction
> doesn't work and we haven't implemented copyinsn(9) on arm64.  And
> implementing copyinsn(9) isn't trivial on arm64.
> 
> Alternatively we can advertise HWCAP_CPUID only if FEAT_IDST is
> implemented.  That should be ok, since the older CPU cores shouldn't
> have any features that aren't discoverable through AT_HWCAP and
> AT_HWCAP2.  Both Linux and FreeBSD do implement emulation "the hard
> way".  So there is a risk that code will skip checking the HWCAP_CPUID
> before trying to read the ID registers.  Such code will fail to run on
> older CPUs.  But such code would already be failing on what we have
> now.  So I propose we go with this solution.

Makes sense.

> One gotcha with this diff is that we now unconditionally read
> ID_AA64MMFR2_EL1.  This should be fine on real hardware (and return),
> but old versions of QEMU (2.5.1 apparently) will trap.  So without a
> workaround, this diff will break OpenBSD on those old versions of
> QEMU.

meh

> ok?

The additional code changes LGTM.

-- 
jca