Index | Thread | Search

From:
Dave Voutila <dv@sisu.io>
Subject:
Re: Allow vmm guests to save/restore pkru via xsave region
To:
tech@openbsd.org
Cc:
mlarkin@openbsd.org, adrianali@fortix.com.ar
Date:
Wed, 20 Aug 2025 16:12:07 -0400

Download raw body.

Thread
Dave Voutila <dv@sisu.io> writes:

> Dave Voutila <dv@sisu.io> writes:
>
>> We're currently letting a guest see the PKU bit via cpuid(0x7, 0) but
>> then not exposing the XSAVE area bit for PKRU via cpuid(0xd, 0).
>>
>> This is fine for OpenBSD guests as we don't use XSAVE/XRSTOR to
>> save/load PKRU state on context switches given how we use PKU today for
>> x-only on Intel. (We explicitly use RDPKRU/WRPKRU.)
>>
>> Newer Linux guests see PKU support and then freak out when the support
>> for XSAVE doesn't exist. This puts them into a very "fun" state where
>> they don't panic but instead thrash about like a lunatic. It's sad,
>> really. See [1] for details.
>>
>> This diff exposes the cpuid bit for the PKRU xsave area if the host has
>> enabled PKU mode. It also adds handling to let the guest set the xcr0
>> bit to let XSAVE/XRSTOR work for the PKRU state.
>>
>> It does not change host PKU or XSAVE behavior.
>>
>> The cpuid emulation change is a bit hacky in reporting xsave area max
>> size and current size, but since the PKRU state comes after all the
>> AVX512 stuff it's basically just needing to use the size of `struct
>> savefpu` plus 64 bits (32 for PKRU, 32 for padding/alignment).
>
> Had a slight mistake in previous diff and was adding 64 bytes to the
> end, not 64 bits. Doh. Now this looks less like it's using a magic
> number of 64 and instead uses a uint64_t.
>

This version makes a comment more intelligible and fixes what looks like
a related issue in the xsetbv handler (not masking to just the bits in
the xsave_mask that are XCR0 related).

>>
>> ok? feedback?
>>
>> [1] https://marc.info/?l=openbsd-misc&m=175505530816494&w=2
>>
>>

diff refs/heads/master refs/heads/vmm-xsave-pku
commit - b981b677995ab0627cfaec84db748a34dc362f99
commit + 8808090b6a2e33830bfce786baad06590e4edbd1
blob - bd1909f33280ca49fa1c546c313f54ac3a52d9b1
blob + 471b12b223a8b12fdca49d7dd21df999516b6e23
--- sys/arch/amd64/amd64/vmm_machdep.c
+++ sys/arch/amd64/amd64/vmm_machdep.c
@@ -3468,8 +3468,8 @@ vmm_fpusave(struct vcpu *vcpu)
 	}

 	/*
-	 * Save full copy of FPU state - guest content is always
-	 * a subset of host's save area (see xsetbv exit handler)
+	 * Save a copy of FPU state - guest content is always
+	 * a subset of host's save area. PKRU is saved separately.
 	 */
 	fpusavereset(&vcpu->vc_g_fpu);
 	vcpu->vc_fpuinited = 1;
@@ -5927,7 +5927,7 @@ svm_handle_xsetbv(struct vcpu *vcpu)
 int
 vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax)
 {
-	uint64_t *rdx, *rcx, val;
+	uint64_t *rdx, *rcx, val, mask = xsave_mask & XFEATURE_XCR0_MASK;

 	rcx = &vcpu->vc_gueststate.vg_rcx;
 	rdx = &vcpu->vc_gueststate.vg_rdx;
@@ -5943,8 +5943,12 @@ vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax)
 		return (vmm_inject_gp(vcpu));
 	}

+	/* If we're exposing PKRU features, allow guests to set PKRU in xcr0. */
+	if (vmm_softc->sc_md.pkru_enabled)
+		mask |= XFEATURE_PKRU;
+
 	val = *rax + (*rdx << 32);
-	if (val & ~xsave_mask) {
+	if (val & ~mask) {
 		DPRINTF("%s: guest specified xcr0 outside xsave_mask %lld\n",
 		    __func__, val);
 		return (vmm_inject_gp(vcpu));
@@ -6187,6 +6191,19 @@ vmm_handle_cpuid_0xd(struct vcpu *vcpu, uint32_t suble
 		}
 		eax = xsave_mask & XFEATURE_XCR0_MASK;
 		edx = (xsave_mask & XFEATURE_XCR0_MASK) >> 32;
+
+		/*
+		 * Emulate support for the pkru xsave region if the
+		 * host has pku enabled. This allow guests to enable
+		 * it in xcr0 and use xsave/xrstor on context switches
+		 * to save or restore pkru.
+		 */
+		if (vmm_softc->sc_md.pkru_enabled) {
+			eax |= XFEATURE_PKRU;
+			ecx = sizeof(struct savefpu) + sizeof(uint64_t);
+			if (xcr0 & XFEATURE_PKRU)
+				ebx = ecx;
+		}
 	} else if (subleaf == 1) {
 		/* mask out XSAVEC, XSAVES, and XFD support */
 		eax &= XSAVE_XSAVEOPT | XSAVE_XGETBV1;