From: Dave Voutila Subject: Allow vmm guests to save/restore pkru via xsave region To: tech@openbsd.org Cc: mlarkin@openbsd.org, adrianali@fortix.com.ar Date: Fri, 15 Aug 2025 18:53:48 -0400 We're currently letting a guest see the PKU bit via cpuid(0x7, 0) but then not exposing the XSAVE area bit for PKRU via cpuid(0xd, 0). This is fine for OpenBSD guests as we don't use XSAVE/XRSTOR to save/load PKRU state on context switches given how we use PKU today for x-only on Intel. (We explicitly use RDPKRU/WRPKRU.) Newer Linux guests see PKU support and then freak out when the support for XSAVE doesn't exist. This puts them into a very "fun" state where they don't panic but instead thrash about like a lunatic. It's sad, really. See [1] for details. This diff exposes the cpuid bit for the PKRU xsave area if the host has enabled PKU mode. It also adds handling to let the guest set the xcr0 bit to let XSAVE/XRSTOR work for the PKRU state. It does not change host PKU or XSAVE behavior. The cpuid emulation change is a bit hacky in reporting xsave area max size and current size, but since the PKRU state comes after all the AVX512 stuff it's basically just needing to use the size of `struct savefpu` plus 64 bits (32 for PKRU, 32 for padding/alignment). ok? feedback? [1] https://marc.info/?l=openbsd-misc&m=175505530816494&w=2 diff refs/heads/ugh refs/heads/vmm-xsave-pku commit - 86e813bbfc617cf34c990962d64537d38e459d3d commit + 2ed79f99ed9cb7ebd79a067c80332dc033fb6ffb blob - bd1909f33280ca49fa1c546c313f54ac3a52d9b1 blob + 5021bdd6aaeb50d33dca12986277628ec75f6b14 --- sys/arch/amd64/amd64/vmm_machdep.c +++ sys/arch/amd64/amd64/vmm_machdep.c @@ -5927,7 +5927,7 @@ svm_handle_xsetbv(struct vcpu *vcpu) int vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax) { - uint64_t *rdx, *rcx, val; + uint64_t *rdx, *rcx, val, mask = xsave_mask; rcx = &vcpu->vc_gueststate.vg_rcx; rdx = &vcpu->vc_gueststate.vg_rdx; @@ -5943,8 +5943,12 @@ vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax) return (vmm_inject_gp(vcpu)); } + /* If we're exposing PKRU features, allow guests to set PKRU in xcr0. */ + if (vmm_softc->sc_md.pkru_enabled) + mask |= XFEATURE_PKRU; + val = *rax + (*rdx << 32); - if (val & ~xsave_mask) { + if (val & ~mask) { DPRINTF("%s: guest specified xcr0 outside xsave_mask %lld\n", __func__, val); return (vmm_inject_gp(vcpu)); @@ -6187,6 +6191,22 @@ vmm_handle_cpuid_0xd(struct vcpu *vcpu, uint32_t suble } eax = xsave_mask & XFEATURE_XCR0_MASK; edx = (xsave_mask & XFEATURE_XCR0_MASK) >> 32; + + /* + * We don't currently use XSAVE to store/restore PKRU, + * but some guests may expect to do so. If PKE is + * supported, the PKRU feature bit should be 1. + * + * This also means adjusting the reported sizes of the + * XSAVE area as it requires an additional 32 bits, but + * also needs to be 64-bit aligned. + */ + if (vmm_softc->sc_md.pkru_enabled) { + eax |= XFEATURE_PKRU; + ecx = sizeof(struct savefpu) + 64; + if (xcr0 & XFEATURE_PKRU) + ebx = ecx; + } } else if (subleaf == 1) { /* mask out XSAVEC, XSAVES, and XFD support */ eax &= XSAVE_XSAVEOPT | XSAVE_XGETBV1;