Download raw body.
[EXT] Re: Allow vmm guests to save/restore pkru via xsave region
Hi,
On Wed, Aug 20, 2025 at 04:12:07PM -0400, Dave Voutila wrote:
> Dave Voutila <dv@sisu.io> writes:
>
> > Dave Voutila <dv@sisu.io> writes:
> >
> >> We're currently letting a guest see the PKU bit via cpuid(0x7, 0) but
> >> then not exposing the XSAVE area bit for PKRU via cpuid(0xd, 0).
> >>
> >> This is fine for OpenBSD guests as we don't use XSAVE/XRSTOR to
> >> save/load PKRU state on context switches given how we use PKU today for
> >> x-only on Intel. (We explicitly use RDPKRU/WRPKRU.)
> >>
> >> Newer Linux guests see PKU support and then freak out when the support
> >> for XSAVE doesn't exist. This puts them into a very "fun" state where
> >> they don't panic but instead thrash about like a lunatic. It's sad,
> >> really. See [1] for details.
> >>
> >> This diff exposes the cpuid bit for the PKRU xsave area if the host has
> >> enabled PKU mode. It also adds handling to let the guest set the xcr0
> >> bit to let XSAVE/XRSTOR work for the PKRU state.
> >>
> >> It does not change host PKU or XSAVE behavior.
> >>
> >> The cpuid emulation change is a bit hacky in reporting xsave area max
> >> size and current size, but since the PKRU state comes after all the
> >> AVX512 stuff it's basically just needing to use the size of `struct
> >> savefpu` plus 64 bits (32 for PKRU, 32 for padding/alignment).
> >
> > Had a slight mistake in previous diff and was adding 64 bytes to the
> > end, not 64 bits. Doh. Now this looks less like it's using a magic
> > number of 64 and instead uses a uint64_t.
> >
>
> This version makes a comment more intelligible and fixes what looks like
> a related issue in the xsetbv handler (not masking to just the bits in
> the xsave_mask that are XCR0 related).
>
> >>
> >> ok? feedback?
I can reproduce the issue with the current debian netinst; I'm seeing
the hang from [1]. With your diff, kernel boots and netinst installation
works. And I see no regression with my OpenBSD guests.
Diff looks good.
ok hshoexer
> >>
> >> [1] https://marc.info/?l=openbsd-misc&m=175505530816494&w=2
> >>
> >>
>
> diff refs/heads/master refs/heads/vmm-xsave-pku
> commit - b981b677995ab0627cfaec84db748a34dc362f99
> commit + 8808090b6a2e33830bfce786baad06590e4edbd1
> blob - bd1909f33280ca49fa1c546c313f54ac3a52d9b1
> blob + 471b12b223a8b12fdca49d7dd21df999516b6e23
> --- sys/arch/amd64/amd64/vmm_machdep.c
> +++ sys/arch/amd64/amd64/vmm_machdep.c
> @@ -3468,8 +3468,8 @@ vmm_fpusave(struct vcpu *vcpu)
> }
>
> /*
> - * Save full copy of FPU state - guest content is always
> - * a subset of host's save area (see xsetbv exit handler)
> + * Save a copy of FPU state - guest content is always
> + * a subset of host's save area. PKRU is saved separately.
> */
> fpusavereset(&vcpu->vc_g_fpu);
> vcpu->vc_fpuinited = 1;
> @@ -5927,7 +5927,7 @@ svm_handle_xsetbv(struct vcpu *vcpu)
> int
> vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax)
> {
> - uint64_t *rdx, *rcx, val;
> + uint64_t *rdx, *rcx, val, mask = xsave_mask & XFEATURE_XCR0_MASK;
>
> rcx = &vcpu->vc_gueststate.vg_rcx;
> rdx = &vcpu->vc_gueststate.vg_rdx;
> @@ -5943,8 +5943,12 @@ vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax)
> return (vmm_inject_gp(vcpu));
> }
>
> + /* If we're exposing PKRU features, allow guests to set PKRU in xcr0. */
> + if (vmm_softc->sc_md.pkru_enabled)
> + mask |= XFEATURE_PKRU;
> +
> val = *rax + (*rdx << 32);
> - if (val & ~xsave_mask) {
> + if (val & ~mask) {
> DPRINTF("%s: guest specified xcr0 outside xsave_mask %lld\n",
> __func__, val);
> return (vmm_inject_gp(vcpu));
> @@ -6187,6 +6191,19 @@ vmm_handle_cpuid_0xd(struct vcpu *vcpu, uint32_t suble
> }
> eax = xsave_mask & XFEATURE_XCR0_MASK;
> edx = (xsave_mask & XFEATURE_XCR0_MASK) >> 32;
> +
> + /*
> + * Emulate support for the pkru xsave region if the
> + * host has pku enabled. This allow guests to enable
> + * it in xcr0 and use xsave/xrstor on context switches
> + * to save or restore pkru.
> + */
> + if (vmm_softc->sc_md.pkru_enabled) {
> + eax |= XFEATURE_PKRU;
> + ecx = sizeof(struct savefpu) + sizeof(uint64_t);
> + if (xcr0 & XFEATURE_PKRU)
> + ebx = ecx;
> + }
> } else if (subleaf == 1) {
> /* mask out XSAVEC, XSAVES, and XFD support */
> eax &= XSAVE_XSAVEOPT | XSAVE_XGETBV1;
>
[EXT] Re: Allow vmm guests to save/restore pkru via xsave region