Index | Thread | Search

From:
Hans-Jörg Höxer <hshoexer@genua.de>
Subject:
Re: [EXT] Re: Allow vmm guests to save/restore pkru via xsave region
To:
<tech@openbsd.org>
Date:
Thu, 28 Aug 2025 21:25:30 +0200

Download raw body.

Thread
Hi,

On Wed, Aug 20, 2025 at 04:12:07PM -0400, Dave Voutila wrote:
> Dave Voutila <dv@sisu.io> writes:
> 
> > Dave Voutila <dv@sisu.io> writes:
> >
> >> We're currently letting a guest see the PKU bit via cpuid(0x7, 0) but
> >> then not exposing the XSAVE area bit for PKRU via cpuid(0xd, 0).
> >>
> >> This is fine for OpenBSD guests as we don't use XSAVE/XRSTOR to
> >> save/load PKRU state on context switches given how we use PKU today for
> >> x-only on Intel. (We explicitly use RDPKRU/WRPKRU.)
> >>
> >> Newer Linux guests see PKU support and then freak out when the support
> >> for XSAVE doesn't exist. This puts them into a very "fun" state where
> >> they don't panic but instead thrash about like a lunatic. It's sad,
> >> really. See [1] for details.
> >>
> >> This diff exposes the cpuid bit for the PKRU xsave area if the host has
> >> enabled PKU mode. It also adds handling to let the guest set the xcr0
> >> bit to let XSAVE/XRSTOR work for the PKRU state.
> >>
> >> It does not change host PKU or XSAVE behavior.
> >>
> >> The cpuid emulation change is a bit hacky in reporting xsave area max
> >> size and current size, but since the PKRU state comes after all the
> >> AVX512 stuff it's basically just needing to use the size of `struct
> >> savefpu` plus 64 bits (32 for PKRU, 32 for padding/alignment).
> >
> > Had a slight mistake in previous diff and was adding 64 bytes to the
> > end, not 64 bits. Doh. Now this looks less like it's using a magic
> > number of 64 and instead uses a uint64_t.
> >
> 
> This version makes a comment more intelligible and fixes what looks like
> a related issue in the xsetbv handler (not masking to just the bits in
> the xsave_mask that are XCR0 related).
> 
> >>
> >> ok? feedback?

I can reproduce the issue with the current debian netinst; I'm seeing
the hang from [1].  With your diff, kernel boots and netinst installation
works.  And I see no regression with my OpenBSD guests.

Diff looks good.

ok hshoexer

> >>
> >> [1] https://marc.info/?l=openbsd-misc&m=175505530816494&w=2
> >>
> >>
> 
> diff refs/heads/master refs/heads/vmm-xsave-pku
> commit - b981b677995ab0627cfaec84db748a34dc362f99
> commit + 8808090b6a2e33830bfce786baad06590e4edbd1
> blob - bd1909f33280ca49fa1c546c313f54ac3a52d9b1
> blob + 471b12b223a8b12fdca49d7dd21df999516b6e23
> --- sys/arch/amd64/amd64/vmm_machdep.c
> +++ sys/arch/amd64/amd64/vmm_machdep.c
> @@ -3468,8 +3468,8 @@ vmm_fpusave(struct vcpu *vcpu)
>  	}
> 
>  	/*
> -	 * Save full copy of FPU state - guest content is always
> -	 * a subset of host's save area (see xsetbv exit handler)
> +	 * Save a copy of FPU state - guest content is always
> +	 * a subset of host's save area. PKRU is saved separately.
>  	 */
>  	fpusavereset(&vcpu->vc_g_fpu);
>  	vcpu->vc_fpuinited = 1;
> @@ -5927,7 +5927,7 @@ svm_handle_xsetbv(struct vcpu *vcpu)
>  int
>  vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax)
>  {
> -	uint64_t *rdx, *rcx, val;
> +	uint64_t *rdx, *rcx, val, mask = xsave_mask & XFEATURE_XCR0_MASK;
> 
>  	rcx = &vcpu->vc_gueststate.vg_rcx;
>  	rdx = &vcpu->vc_gueststate.vg_rdx;
> @@ -5943,8 +5943,12 @@ vmm_handle_xsetbv(struct vcpu *vcpu, uint64_t *rax)
>  		return (vmm_inject_gp(vcpu));
>  	}
> 
> +	/* If we're exposing PKRU features, allow guests to set PKRU in xcr0. */
> +	if (vmm_softc->sc_md.pkru_enabled)
> +		mask |= XFEATURE_PKRU;
> +
>  	val = *rax + (*rdx << 32);
> -	if (val & ~xsave_mask) {
> +	if (val & ~mask) {
>  		DPRINTF("%s: guest specified xcr0 outside xsave_mask %lld\n",
>  		    __func__, val);
>  		return (vmm_inject_gp(vcpu));
> @@ -6187,6 +6191,19 @@ vmm_handle_cpuid_0xd(struct vcpu *vcpu, uint32_t suble
>  		}
>  		eax = xsave_mask & XFEATURE_XCR0_MASK;
>  		edx = (xsave_mask & XFEATURE_XCR0_MASK) >> 32;
> +
> +		/*
> +		 * Emulate support for the pkru xsave region if the
> +		 * host has pku enabled. This allow guests to enable
> +		 * it in xcr0 and use xsave/xrstor on context switches
> +		 * to save or restore pkru.
> +		 */
> +		if (vmm_softc->sc_md.pkru_enabled) {
> +			eax |= XFEATURE_PKRU;
> +			ecx = sizeof(struct savefpu) + sizeof(uint64_t);
> +			if (xcr0 & XFEATURE_PKRU)
> +				ebx = ecx;
> +		}
>  	} else if (subleaf == 1) {
>  		/* mask out XSAVEC, XSAVES, and XFD support */
>  		eax &= XSAVE_XSAVEOPT | XSAVE_XGETBV1;
>