From: Hans-Jörg Höxer Subject: Re: [EXT] Re: Allow vmm guests to save/restore pkru via xsave region To: Date: Thu, 28 Aug 2025 21:25:30 +0200 Hi, On Wed, Aug 20, 2025 at 04:12:07PM -0400, Dave Voutila wrote: > Dave Voutila writes: > > > Dave Voutila writes: > > > >> We're currently letting a guest see the PKU bit via cpuid(0x7, 0) but > >> then not exposing the XSAVE area bit for PKRU via cpuid(0xd, 0). > >> > >> This is fine for OpenBSD guests as we don't use XSAVE/XRSTOR to > >> save/load PKRU state on context switches given how we use PKU today for > >> x-only on Intel. (We explicitly use RDPKRU/WRPKRU.) > >> > >> Newer Linux guests see PKU support and then freak out when the support > >> for XSAVE doesn't exist. This puts them into a very "fun" state where > >> they don't panic but instead thrash about like a lunatic. It's sad, > >> really. See [1] for details. > >> > >> This diff exposes the cpuid bit for the PKRU xsave area if the host has > >> enabled PKU mode. It also adds handling to let the guest set the xcr0 > >> bit to let XSAVE/XRSTOR work for the PKRU state. > >> > >> It does not change host PKU or XSAVE behavior. > >> > >> The cpuid emulation change is a bit hacky in reporting xsave area max > >> size and current size, but since the PKRU state comes after all the > >> AVX512 stuff it's basically just needing to use the size of `struct > >> savefpu` plus 64 bits (32 for PKRU, 32 for padding/alignment). > > > > Had a slight mistake in previous diff and was adding 64 bytes to the > > end, not 64 bits. Doh. Now this looks less like it's using a magic > > number of 64 and instead uses a uint64_t. > > > > This version makes a comment more intelligible and fixes what looks like > a related issue in the xsetbv handler (not masking to just the bits in > the xsave_mask that are XCR0 related). > > >> > >> ok? feedback? I can reproduce the issue with the current debian netinst; I'm seeing the hang from [1]. With your diff, kernel boots and netinst installation works. And I see no regression with my OpenBSD guests. Diff looks good. ok hshoexer > >> > >> [1] https://marc.info/?l=openbsd-misc&m=175505530816494&w=2 > >> > >> > > diff refs/heads/master refs/heads/vmm-xsave-pku > commit - b981b677995ab0627cfaec84db748a34dc362f99 > commit + 8808090b6a2e33830bfce786baad06590e4edbd1 > blob - bd1909f33280ca49fa1c546c313f54ac3a52d9b1 > blob + 471b12b223a8b12fdca49d7dd21df999516b6e23 > --- sys/arch/amd64/amd64/vmm_machdep.c > +++ sys/arch/amd64/amd64/vmm_machdep.c > @@ -3468,8 +3468,8 @@ vmm_fpusave(struct vcpu *vcpu) > } > > /* > - * Save full copy of FPU state - guest content is always > - * a subset of host's save area (see xsetbv exit handler) > + * Save a copy of FPU state - guest content is always > + * a subset of host's save area. PKRU is saved separately. > */ > fpusavereset(&vcpu->vc_g_fpu); > vcpu->vc_fpuinited = 1; > @@ -5927,7 +5927,7 @@ svm_handle_xsetbv(struct vcpu *vcpu) > int > vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax) > { > - uint64_t *rdx, *rcx, val; > + uint64_t *rdx, *rcx, val, mask = xsave_mask & XFEATURE_XCR0_MASK; > > rcx = &vcpu->vc_gueststate.vg_rcx; > rdx = &vcpu->vc_gueststate.vg_rdx; > @@ -5943,8 +5943,12 @@ vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax) > return (vmm_inject_gp(vcpu)); > } > > + /* If we're exposing PKRU features, allow guests to set PKRU in xcr0. */ > + if (vmm_softc->sc_md.pkru_enabled) > + mask |= XFEATURE_PKRU; > + > val = *rax + (*rdx << 32); > - if (val & ~xsave_mask) { > + if (val & ~mask) { > DPRINTF("%s: guest specified xcr0 outside xsave_mask %lld\n", > __func__, val); > return (vmm_inject_gp(vcpu)); > @@ -6187,6 +6191,19 @@ vmm_handle_cpuid_0xd(struct vcpu *vcpu, uint32_t suble > } > eax = xsave_mask & XFEATURE_XCR0_MASK; > edx = (xsave_mask & XFEATURE_XCR0_MASK) >> 32; > + > + /* > + * Emulate support for the pkru xsave region if the > + * host has pku enabled. This allow guests to enable > + * it in xcr0 and use xsave/xrstor on context switches > + * to save or restore pkru. > + */ > + if (vmm_softc->sc_md.pkru_enabled) { > + eax |= XFEATURE_PKRU; > + ecx = sizeof(struct savefpu) + sizeof(uint64_t); > + if (xcr0 & XFEATURE_PKRU) > + ebx = ecx; > + } > } else if (subleaf == 1) { > /* mask out XSAVEC, XSAVES, and XFD support */ > eax &= XSAVE_XSAVEOPT | XSAVE_XGETBV1; >