From: Dave Voutila Subject: Re: Allow vmm guests to save/restore pkru via xsave region To: tech@openbsd.org Cc: mlarkin@openbsd.org, adrianali@fortix.com.ar Date: Sat, 16 Aug 2025 07:24:20 -0400 Dave Voutila writes: > We're currently letting a guest see the PKU bit via cpuid(0x7, 0) but > then not exposing the XSAVE area bit for PKRU via cpuid(0xd, 0). > > This is fine for OpenBSD guests as we don't use XSAVE/XRSTOR to > save/load PKRU state on context switches given how we use PKU today for > x-only on Intel. (We explicitly use RDPKRU/WRPKRU.) > > Newer Linux guests see PKU support and then freak out when the support > for XSAVE doesn't exist. This puts them into a very "fun" state where > they don't panic but instead thrash about like a lunatic. It's sad, > really. See [1] for details. > > This diff exposes the cpuid bit for the PKRU xsave area if the host has > enabled PKU mode. It also adds handling to let the guest set the xcr0 > bit to let XSAVE/XRSTOR work for the PKRU state. > > It does not change host PKU or XSAVE behavior. > > The cpuid emulation change is a bit hacky in reporting xsave area max > size and current size, but since the PKRU state comes after all the > AVX512 stuff it's basically just needing to use the size of `struct > savefpu` plus 64 bits (32 for PKRU, 32 for padding/alignment). Had a slight mistake in previous diff and was adding 64 bytes to the end, not 64 bits. Doh. Now this looks less like it's using a magic number of 64 and instead uses a uint64_t. > > ok? feedback? > > [1] https://marc.info/?l=openbsd-misc&m=175505530816494&w=2 > > diff refs/heads/master refs/heads/vmm-xsave-pku commit - db24b9d97dea88c73b2eb08398fbba65a476b372 commit + da654a7ea2327e65f14897e37e6ebf5cdf1ea093 blob - bd1909f33280ca49fa1c546c313f54ac3a52d9b1 blob + b88ea6b58eefbb3884ec4b74880d0e59700a5c31 --- sys/arch/amd64/amd64/vmm_machdep.c +++ sys/arch/amd64/amd64/vmm_machdep.c @@ -5927,7 +5927,7 @@ svm_handle_xsetbv(struct vcpu *vcpu) int vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax) { - uint64_t *rdx, *rcx, val; + uint64_t *rdx, *rcx, val, mask = xsave_mask; rcx = &vcpu->vc_gueststate.vg_rcx; rdx = &vcpu->vc_gueststate.vg_rdx; @@ -5943,8 +5943,12 @@ vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax) return (vmm_inject_gp(vcpu)); } + /* If we're exposing PKRU features, allow guests to set PKRU in xcr0. */ + if (vmm_softc->sc_md.pkru_enabled) + mask |= XFEATURE_PKRU; + val = *rax + (*rdx << 32); - if (val & ~xsave_mask) { + if (val & ~mask) { DPRINTF("%s: guest specified xcr0 outside xsave_mask %lld\n", __func__, val); return (vmm_inject_gp(vcpu)); @@ -6187,6 +6191,22 @@ vmm_handle_cpuid_0xd(struct vcpu *vcpu, uint32_t suble } eax = xsave_mask & XFEATURE_XCR0_MASK; edx = (xsave_mask & XFEATURE_XCR0_MASK) >> 32; + + /* + * We don't currently use XSAVE to store/restore PKRU, + * but some guests may expect to do so. If PKE is + * supported, the PKRU feature bit should be 1. + * + * This also means adjusting the reported sizes of the + * XSAVE area as it requires an additional 32 bits, but + * also needs to be 64-bit aligned. + */ + if (vmm_softc->sc_md.pkru_enabled) { + eax |= XFEATURE_PKRU; + ecx = sizeof(struct savefpu) + sizeof(uint64_t); + if (xcr0 & XFEATURE_PKRU) + ebx = ecx; + } } else if (subleaf == 1) { /* mask out XSAVEC, XSAVES, and XFD support */ eax &= XSAVE_XSAVEOPT | XSAVE_XGETBV1;