Index | Thread | Search

From:
Mark Kettenis <mark.kettenis@xs4all.nl>
Subject:
Re: Emulate CPU ID register access on arm64
To:
Jeremie Courreges-Anglas <jca@wxcvbn.org>
Cc:
patrick@openbsd.org, tech@openbsd.org, brad@comstyle.com, naddy@openbsd.org
Date:
Mon, 22 Jul 2024 20:30:58 +0200

Download raw body.

Thread
> Date: Sun, 21 Jul 2024 21:41:47 +0200
> From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
> 
> On Wed, Jul 17, 2024 at 10:40:57PM +0200, Mark Kettenis wrote:
> > > Date: Sun, 14 Jul 2024 12:52:36 +0200
> > > From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
> > 
> > The code rearrange bit is in, so that leaves the emulation bits.
> > 
> > > On Sun, Jul 14, 2024 at 11:25:27AM +0200, Mark Kettenis wrote:
> > > > As mentioned in the HWCAP discussion, new arm64 bits are typically no
> > > > longer assigned and instead the HWCAP_CPUID bit is used to signal that
> > > > the kernel emulates access to the CPU ID registers.  The diff below
> > > > implements this.
> > > > 
> > > > The architecture is clearly designed to make emulattion possible and
> > > > easy.  There is a special trap for MSR access and it provides all the
> > > > detailt needed to emulate access without the need to read the
> > > > instruction.  The trapping from EL0 is probably done to let the OS
> > > > sanitize the values such that only features relevant to userland and
> > > > common to all CPU cores in the system are advertised.
> > > > 
> > > > There are some open questions though.  The diff is rather strict in
> > > > what access it emulates.  For now this is only the "known" ID_AA64_xxx
> > > > registers, that is the ID_AA64_xxx registers that are currently
> > > > defined by the architectures.  I think the architecture actually says
> > > > that the currently undefined ID_AA64_xxx registers should return zero,
> > > > and Linux allows access to them all.  But I would like to know when
> > > > userland processes actually start accessing those, so for now we'll
> > > > SIGILL.
> > > 
> > > ack
> > > 
> > > > I didn't implement emulation for the "32-bit" ID_xxx registers.  We
> > > > don't support excuting 32-bit processes so these registers should be
> > > > irrelevant.  But they can be accessed using arm64 instructions, so for
> > > > now we'll continue to SIGILL these as well.
> > > 
> > > No idea how often those registers might be used.
> > > 
> > > > Then ther are the MIDR_EL1 and MPIDR_EL1 registers.  Linux allows
> > > > access to these registers.  The problem with these is that their
> > > > values may depend on what CPU a process is executed on.  So in general
> > > > I don't think userland code should look at these.  However, some
> > > > userland code looks at these since certain optimizations might apply
> > > > only to specific CPU implementations.  On amd64, where the CPUID
> > > > instruction is available to userland, looking at the CPU familiy and
> > > > model bits is accepted practice.
> > > 
> > > Looks like killing processes for (at least) MIDR_EL1 access would lead
> > > to breakage in a bunch of ports.
> > > 
> > > For example there is devel/abseil-cpp and the copies of it bundled
> > > into other ports.  The detection is currently done through
> > > getauxval(3) but letting this SIGILL would be an inconvenient trap for
> > > anyone adding detection through elf_aux_info(3).
> > 
> > Ok.  So that does a microarchitectural optimization of the CRC
> > calculation.  Looks like it only does that for hardware that Google
> > cares about in production though.  So they avoid the issue with having
> > different cores integrated on a single SoC.
> > 
> > So I've implemented MIDR_EL1, MPIDR_EL1 and REVIDR_EL1 in the same way
> > as Linux, passing through MIDR)EL1 (the core type) and faking the
> > other registers.
> 
> Cool.  The midr/revidr_el1 behavior matches what's documented at
> https://docs.kernel.org/arch/arm64/cpu-feature-registers.html
> 
> > > https://codesearch.debian.net/search?q=%26.*HWCAP_CPUID&literal=0
> > > 
> > > Granted, as long as we don't advertize HWCAP_CPUID your proposal is
> > > still more lenient/useful than our current policy of always have
> > > userland trap.
> > > 
> > > > Thoughts?
> 
> The emulation code looks good to me according to ARM DDI 0487K.a
> D22.3.  I can't test this currently, ok jca@ FWIW.

As naddy@ found out, it actually isn't ok.  The exception that helps
emulation of the MSR/MRS instructions is only available if FEAT_IDST
is implemented.  That's an (mandatory) ARMv8.4 feature, so older CPU
cores don't support it.  On those older CPUs the instructions generate
a standard "unknown instruction" trap.

We could implement support for emulating these instructions "the hard
way", by reading the instruction and decoding it.  However, since
arm64 implements X-only, using copyin(9) to read the instruction
doesn't work and we haven't implemented copyinsn(9) on arm64.  And
implementing copyinsn(9) isn't trivial on arm64.

Alternatively we can advertise HWCAP_CPUID only if FEAT_IDST is
implemented.  That should be ok, since the older CPU cores shouldn't
have any features that aren't discoverable through AT_HWCAP and
AT_HWCAP2.  Both Linux and FreeBSD do implement emulation "the hard
way".  So there is a risk that code will skip checking the HWCAP_CPUID
before trying to read the ID registers.  Such code will fail to run on
older CPUs.  But such code would already be failing on what we have
now.  So I propose we go with this solution.

One gotcha with this diff is that we now unconditionally read
ID_AA64MMFR2_EL1.  This should be fine on real hardware (and return),
but old versions of QEMU (2.5.1 apparently) will trap.  So without a
workaround, this diff will break OpenBSD on those old versions of
QEMU.

ok?


Index: arch/arm64/arm64/cpu.c
===================================================================
RCS file: /cvs/src/sys/arch/arm64/arm64/cpu.c,v
retrieving revision 1.129
diff -u -p -r1.129 cpu.c
--- arch/arm64/arm64/cpu.c	21 Jul 2024 18:57:31 -0000	1.129
+++ arch/arm64/arm64/cpu.c	22 Jul 2024 18:25:39 -0000
@@ -242,6 +242,9 @@ int cpu_node;
 uint64_t cpu_id_aa64isar0;
 uint64_t cpu_id_aa64isar1;
 uint64_t cpu_id_aa64isar2;
+uint64_t cpu_id_aa64mmfr0;
+uint64_t cpu_id_aa64mmfr1;
+uint64_t cpu_id_aa64mmfr2;
 uint64_t cpu_id_aa64pfr0;
 uint64_t cpu_id_aa64pfr1;
 
@@ -487,6 +490,7 @@ cpu_identify(struct cpu_info *ci)
 	static uint64_t prev_id_aa64isar2;
 	static uint64_t prev_id_aa64mmfr0;
 	static uint64_t prev_id_aa64mmfr1;
+	static uint64_t prev_id_aa64mmfr2;
 	static uint64_t prev_id_aa64pfr0;
 	static uint64_t prev_id_aa64pfr1;
 	uint64_t midr, impl, part;
@@ -642,6 +646,7 @@ cpu_identify(struct cpu_info *ci)
 	    READ_SPECIALREG(id_aa64isar2_el1) == prev_id_aa64isar2 &&
 	    READ_SPECIALREG(id_aa64mmfr0_el1) == prev_id_aa64mmfr0 &&
 	    READ_SPECIALREG(id_aa64mmfr1_el1) == prev_id_aa64mmfr1 &&
+	    READ_SPECIALREG(id_aa64mmfr2_el1) == prev_id_aa64mmfr2 &&
 	    READ_SPECIALREG(id_aa64pfr0_el1) == prev_id_aa64pfr0 &&
 	    READ_SPECIALREG(id_aa64pfr1_el1) == prev_id_aa64pfr1)
 		return;
@@ -662,6 +667,18 @@ cpu_identify(struct cpu_info *ci)
 		printf("\n%s: mismatched ID_AA64ISAR2_EL1",
 		    ci->ci_dev->dv_xname);
 	}
+	if (READ_SPECIALREG(id_aa64mmfr0_el1) != cpu_id_aa64mmfr0) {
+		printf("\n%s: mismatched ID_AA64MMFR0_EL1",
+		    ci->ci_dev->dv_xname);
+	}
+	if (READ_SPECIALREG(id_aa64mmfr1_el1) != cpu_id_aa64mmfr1) {
+		printf("\n%s: mismatched ID_AA64MMFR1_EL1",
+		    ci->ci_dev->dv_xname);
+	}
+	if (READ_SPECIALREG(id_aa64mmfr2_el1) != cpu_id_aa64mmfr2) {
+		printf("\n%s: mismatched ID_AA64MMFR2_EL1",
+		    ci->ci_dev->dv_xname);
+	}
 	id = READ_SPECIALREG(id_aa64pfr0_el1);
 	/* Allow CSV2/CVS3 to be different. */
 	id &= ~ID_AA64PFR0_CSV2_MASK;
@@ -939,6 +956,16 @@ cpu_identify(struct cpu_info *ci)
 	}
 
 	/*
+	 * ID_AA64MMFR2
+	 */
+	id = READ_SPECIALREG(id_aa64mmfr2_el1);
+
+	if (ID_AA64MMFR2_IDS(id) >= ID_AA64MMFR2_IDS_IMPL) {
+		printf("%sIDS", sep);
+		sep = ",";
+	}
+
+	/*
 	 * ID_AA64PFR0
 	 */
 	id = READ_SPECIALREG(id_aa64pfr0_el1);
@@ -989,6 +1016,7 @@ cpu_identify(struct cpu_info *ci)
 	prev_id_aa64isar2 = READ_SPECIALREG(id_aa64isar2_el1);
 	prev_id_aa64mmfr0 = READ_SPECIALREG(id_aa64mmfr0_el1);
 	prev_id_aa64mmfr1 = READ_SPECIALREG(id_aa64mmfr1_el1);
+	prev_id_aa64mmfr2 = READ_SPECIALREG(id_aa64mmfr2_el1);
 	prev_id_aa64pfr0 = READ_SPECIALREG(id_aa64pfr0_el1);
 	prev_id_aa64pfr1 = READ_SPECIALREG(id_aa64pfr1_el1);
 
@@ -1023,6 +1051,7 @@ cpu_identify(struct cpu_info *ci)
 void
 cpu_identify_cleanup(void)
 {
+	uint64_t id_aa64mmfr2;
 	uint64_t value;
 
 	/* ID_AA64ISAR0_EL1 */
@@ -1040,6 +1069,15 @@ cpu_identify_cleanup(void)
 	value &= ~ID_AA64ISAR2_CLRBHB_MASK;
 	cpu_id_aa64isar2 = value;
 
+	/* ID_AA64MMFR0_EL1 */
+	cpu_id_aa64mmfr0 = 0;
+
+	/* ID_AA64MMFR1_EL1 */
+	cpu_id_aa64mmfr1 = 0;
+
+	/* ID_AA64MMFR2_EL1 */
+	cpu_id_aa64mmfr2 = 0;
+
 	/* ID_AA64PFR0_EL1 */
 	value = 0;
 	value |= cpu_id_aa64pfr0 & ID_AA64PFR0_FP_MASK;
@@ -1071,7 +1109,9 @@ cpu_identify_cleanup(void)
 		hwcap |= HWCAP_ATOMICS;
 	/* HWCAP_FPHP */
 	/* HWCAP_ASIMDHP */
-	/* HWCAP_CPUID */
+	id_aa64mmfr2 = READ_SPECIALREG(id_aa64mmfr2_el1);
+	if (ID_AA64MMFR2_IDS(id_aa64mmfr2) >= ID_AA64MMFR2_IDS_IMPL)
+		hwcap |= HWCAP_CPUID;
 	if (ID_AA64ISAR0_RDM(cpu_id_aa64isar0) >= ID_AA64ISAR0_RDM_IMPL)
 		hwcap |= HWCAP_ASIMDRDM;
 	if (ID_AA64ISAR1_JSCVT(cpu_id_aa64isar1) >= ID_AA64ISAR1_JSCVT_IMPL)
@@ -1271,6 +1311,9 @@ cpu_attach(struct device *parent, struct
 		cpu_id_aa64isar0 = READ_SPECIALREG(id_aa64isar0_el1);
 		cpu_id_aa64isar1 = READ_SPECIALREG(id_aa64isar1_el1);
 		cpu_id_aa64isar2 = READ_SPECIALREG(id_aa64isar2_el1);
+		cpu_id_aa64mmfr0 = READ_SPECIALREG(id_aa64mmfr0_el1);
+		cpu_id_aa64mmfr1 = READ_SPECIALREG(id_aa64mmfr1_el1);
+		cpu_id_aa64mmfr2 = READ_SPECIALREG(id_aa64mmfr2_el1);
 		cpu_id_aa64pfr0 = READ_SPECIALREG(id_aa64pfr0_el1);
 		cpu_id_aa64pfr1 = READ_SPECIALREG(id_aa64pfr1_el1);
 
Index: arch/arm64/arm64/machdep.c
===================================================================
RCS file: /cvs/src/sys/arch/arm64/arm64/machdep.c,v
retrieving revision 1.91
diff -u -p -r1.91 machdep.c
--- arch/arm64/arm64/machdep.c	17 Jul 2024 15:21:59 -0000	1.91
+++ arch/arm64/arm64/machdep.c	22 Jul 2024 18:25:39 -0000
@@ -360,8 +360,11 @@ cpu_sysctl(int *name, u_int namelen, voi
 	case CPU_ID_AA64PFR1:
 		return sysctl_rdquad(oldp, oldlenp, newp, cpu_id_aa64pfr1);
 	case CPU_ID_AA64MMFR0:
+		return sysctl_rdquad(oldp, oldlenp, newp, cpu_id_aa64mmfr0);
 	case CPU_ID_AA64MMFR1:
+		return sysctl_rdquad(oldp, oldlenp, newp, cpu_id_aa64mmfr1);
 	case CPU_ID_AA64MMFR2:
+		return sysctl_rdquad(oldp, oldlenp, newp, cpu_id_aa64mmfr2);
 	case CPU_ID_AA64SMFR0:
 	case CPU_ID_AA64ZFR0:
 		return sysctl_rdquad(oldp, oldlenp, newp, 0);
Index: arch/arm64/arm64/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/arm64/arm64/trap.c,v
retrieving revision 1.48
diff -u -p -r1.48 trap.c
--- arch/arm64/arm64/trap.c	21 Feb 2024 15:53:07 -0000	1.48
+++ arch/arm64/arm64/trap.c	22 Jul 2024 18:25:39 -0000
@@ -187,6 +187,99 @@ kdata_abort(struct trapframe *frame, uin
 	}
 }
 
+static int
+emulate_msr(struct trapframe *frame, uint64_t esr)
+{
+	u_int rt = ISS_MSR_Rt(esr);
+	uint64_t val;
+
+	/* Only emulate reads. */
+	if ((esr & ISS_MSR_DIR) == 0)
+		return 0;
+
+	/* Only emulate non-debug System register access. */
+	if (ISS_MSR_OP0(esr) != 3 || ISS_MSR_OP1(esr) != 0 ||
+	    ISS_MSR_CRn(esr) != 0)
+		return 0;
+
+	switch (ISS_MSR_CRm(esr)) {
+	case 0:
+		switch (ISS_MSR_OP2(esr)) {
+		case 0:		/* MIDR_EL1 */
+			val = READ_SPECIALREG(midr_el1);
+			break;
+		case 5:		/* MPIDR_EL1 */
+			/*
+			 * Don't reveal the topology to userland.  But
+			 * return a valid value; Bit 31 is RES1.
+			 */
+			val = 0x80000000;
+			break;
+		case 6:		/* REVIDR_EL1 */
+			val = 0;
+			break;
+		default:
+			return 0;
+		}
+		break;
+	case 4:
+		switch (ISS_MSR_OP2(esr)) {
+		case 0:		/* ID_AA64PFR0_EL1 */
+			val = cpu_id_aa64pfr0;
+			break;
+		case 1:		/* ID_AA64PFR1_EL1 */
+			val = cpu_id_aa64pfr1;
+			break;
+		case 2:		/* ID_AA64PFR2_EL1 */
+		case 4:		/* ID_AA64ZFR0_EL1 */
+		case 5:		/* ID_AA64SMFR0_EL1 */
+			val = 0;
+			break;
+		default:
+			return 0;
+		}
+		break;
+	case 6:
+		switch (ISS_MSR_OP2(esr)) {
+		case 0:	/* ID_AA64ISAR0_EL1 */
+			val = cpu_id_aa64isar0;
+			break;
+		case 1: /* ID_AA64ISAR1_EL1 */
+			val = cpu_id_aa64isar1;
+			break;
+		case 2: /* ID_AA64ISAR2_EL2 */
+			val = cpu_id_aa64isar2;
+			break;
+		default:
+			return 0;
+		}
+		break;
+	case 7:
+		switch (ISS_MSR_OP2(esr)) {
+		case 0: /* ID_AA64MMFR0_EL1 */
+		case 1: /* ID_AA64MMFR1_EL1 */
+		case 2: /* ID_AA64MMFR2_EL1 */
+		case 3: /* ID_AA64MMFR3_EL1 */
+		case 4: /* ID_AA64MMFR4_EL1 */
+			val = 0;
+			break;
+		default:
+			return 0;
+		}
+		break;
+	default:
+		return 0;
+	}
+
+	if (rt < 30)
+		frame->tf_x[rt] = val;
+	else if (rt == 30)
+		frame->tf_lr = val;
+	frame->tf_elr += 4;
+
+	return 1;
+}
+
 void
 do_el1h_sync(struct trapframe *frame)
 {
@@ -288,6 +381,10 @@ do_el0_sync(struct trapframe *frame)
 		sv.sival_ptr = (void *)frame->tf_elr;
 		trapsignal(p, SIGILL, esr, ILL_BTCFI, sv);
 		break;
+	case EXCP_MSR:
+		if (emulate_msr(frame, esr))
+			break;
+		/* FALLTHROUGH */
 	case EXCP_FPAC:
 		curcpu()->ci_flush_bp();
 		sv.sival_ptr = (void *)frame->tf_elr;
Index: arch/arm64/include/armreg.h
===================================================================
RCS file: /cvs/src/sys/arch/arm64/include/armreg.h,v
retrieving revision 1.35
diff -u -p -r1.35 armreg.h
--- arch/arm64/include/armreg.h	23 Jun 2024 10:17:16 -0000	1.35
+++ arch/arm64/include/armreg.h	22 Jul 2024 18:25:40 -0000
@@ -171,6 +171,26 @@
 #define	 ISS_DATA_DFSC_ECC_L3	(0x1f << 0)
 #define	 ISS_DATA_DFSC_ALIGN	(0x21 << 0)
 #define	 ISS_DATA_DFSC_TLB_CONFLICT (0x30 << 0)
+#define	 ISS_MSR_DIR_SHIFT	0
+#define	 ISS_MSR_DIR		(0x01 << ISS_MSR_DIR_SHIFT)
+#define	 ISS_MSR_Rt_SHIFT	5
+#define	 ISS_MSR_Rt_MASK	(0x1f << ISS_MSR_Rt_SHIFT)
+#define	 ISS_MSR_Rt(x)		(((x) & ISS_MSR_Rt_MASK) >> ISS_MSR_Rt_SHIFT)
+#define	 ISS_MSR_CRm_SHIFT	1
+#define	 ISS_MSR_CRm_MASK	(0xf << ISS_MSR_CRm_SHIFT)
+#define	 ISS_MSR_CRm(x)		(((x) & ISS_MSR_CRm_MASK) >> ISS_MSR_CRm_SHIFT)
+#define	 ISS_MSR_CRn_SHIFT	10
+#define	 ISS_MSR_CRn_MASK	(0xf << ISS_MSR_CRn_SHIFT)
+#define	 ISS_MSR_CRn(x)		(((x) & ISS_MSR_CRn_MASK) >> ISS_MSR_CRn_SHIFT)
+#define	 ISS_MSR_OP1_SHIFT	14
+#define	 ISS_MSR_OP1_MASK	(0x7 << ISS_MSR_OP1_SHIFT)
+#define	 ISS_MSR_OP1(x)		(((x) & ISS_MSR_OP1_MASK) >> ISS_MSR_OP1_SHIFT)
+#define	 ISS_MSR_OP2_SHIFT	17
+#define	 ISS_MSR_OP2_MASK	(0x7 << ISS_MSR_OP2_SHIFT)
+#define	 ISS_MSR_OP2(x)		(((x) & ISS_MSR_OP2_MASK) >> ISS_MSR_OP2_SHIFT)
+#define	 ISS_MSR_OP0_SHIFT	20
+#define	 ISS_MSR_OP0_MASK	(0x3 << ISS_MSR_OP0_SHIFT)
+#define	 ISS_MSR_OP0(x)		(((x) & ISS_MSR_OP0_MASK) >> ISS_MSR_OP0_SHIFT)
 #define	ESR_ELx_IL		(0x01 << 25)
 #define	ESR_ELx_EC_SHIFT	26
 #define	ESR_ELx_EC_MASK		(0x3f << 26)
@@ -537,6 +557,10 @@
 #define	ID_AA64MMFR2_CCIDX_MASK		(0xfULL << ID_AA64MMFR2_CCIDX_SHIFT)
 #define	ID_AA64MMFR2_CCIDX(x)		((x) & ID_AA64MMFR2_CCIDX_MASK)
 #define	 ID_AA64MMFR2_CCIDX_IMPL	(0x1ULL << ID_AA64MMFR2_CCIDX_SHIFT)
+#define	ID_AA64MMFR2_IDS_SHIFT		36
+#define	ID_AA64MMFR2_IDS_MASK		(0xfULL << ID_AA64MMFR2_IDS_SHIFT)
+#define	ID_AA64MMFR2_IDS(x)		((x) & ID_AA64MMFR2_IDS_MASK)
+#define	 ID_AA64MMFR2_IDS_IMPL		(0x1ULL << ID_AA64MMFR2_IDS_SHIFT)
 
 /* ID_AA64PFR0_EL1 */
 #define	ID_AA64PFR0_MASK		0xff0fffffffffffffULL
Index: arch/arm64/include/cpu.h
===================================================================
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
retrieving revision 1.49
diff -u -p -r1.49 cpu.h
--- arch/arm64/include/cpu.h	17 Jul 2024 15:21:59 -0000	1.49
+++ arch/arm64/include/cpu.h	22 Jul 2024 18:25:40 -0000
@@ -64,6 +64,9 @@
 extern uint64_t cpu_id_aa64isar0;
 extern uint64_t cpu_id_aa64isar1;
 extern uint64_t cpu_id_aa64isar2;
+extern uint64_t cpu_id_aa64mmfr0;
+extern uint64_t cpu_id_aa64mmfr1;
+extern uint64_t cpu_id_aa64mmfr2;
 extern uint64_t cpu_id_aa64pfr0;
 extern uint64_t cpu_id_aa64pfr1;