From: hshoexer <hshoexer@yerbouti.franken.de>
Subject: Re: vmm(4):  For SEV-ES enabled guest raise #VC on MMIO related #NPF
To: tech@openbsd.org
Cc: mlarkin@openbsd.org, dv@openbsd.org
Date: Wed, 26 Nov 2025 10:47:00 +0100

On Mon, Nov 17, 2025 at 08:44:21PM +0100, hshoexer wrote:
> On Sun, Nov 16, 2025 at 06:17:24PM -0800, Mike Larkin wrote:
> > On Thu, Nov 13, 2025 at 01:04:08PM +0100, hshoexer wrote:
> > > Hi,
> > >
> > > for SEV-ES enabled guests MMIO should raise a #VC trap with error
> > > code SVM_VMEXIT_NPF.  This is required as vmm(4) can not analyze
> > > the guest code causing the exit; the guest itself has to assist
> > > which requires a #VC trap being raised.
> > >
> > > As SEV-ES guest we try to avoid MMIO by using paravirtualization.
> > > Nonetheless, vmm(4) should enforce #NPF/MMIO for SEV guests; e.g.
> > > to find bugs.
> > >
> > > Enforcing #VC on MMIO is accomplished by setting up a mapping in
> > > the nested page table that has reserved bits set.  The bits of the
> > > physical address portion of the PTE that are affected by physical
> > > address bit reduction are reserved.  When these bits are set, a #VC
> > > trap with error code SVM_VMEXIT_NPF is raised when the guest resumes.
> > > Further MMIO to the same page will raise #VC right away.
> > >
> > > This approach is recommended by [1] section 4.1.5.  It can be tested
> > > with regress/sys/arch/amd64/seves_mmio/.
> > >
> > > For non-SEV-ES guests nothing changes, we forward the fault to
> > > vmd(8) as before.
> > >
> > > [1] https://docs.amd.com/v/u/en-US/56421
> > >
> > 
> > I will need more detail as to what is going on in the diff below, before I can
> > proceed.
> > 
> > it looks like you are letting the guest map any gpa to (?) some HPA whose
> > calculation is not obvious or clear. are you trying to force a bad mapping?
> > why not just use PROT_NONE for the mapping and force it to some completely
> > bogus HPA? that way you're sure the guest can't sneak some mappings past you?
> > 
> > or is this what you are doing? I can't understand from the diff. is this what
> > the "address reduction" part is?
> 
> yes, that's what I'm trying to do.
> 
> But let's take a step back:  On amd64 physical addresses have a
> maximum size of 52 bits.  The top most 12 bits are reserved (and
> in the PTE these 12 bits are used for eg. NX XO, etc.)
> 
> With SEV enabled the bit 51 is repurposed as crypt bit, indicating
> whether the addressed memory is to be encrypted or not.  The next
> few bits are used for the ASID.
> 
> Now:  According to [2] p. 37 the actually addressable physical
> memory and the number of ASID is determind by the firmware. On my
> EPYC 9124 the physical address size is reduced from 52 to 46
> (indicated by cpuid 0x000001f.ebx).  The bits 51:47 are considered
> reserved.
> 
> Now, to raise a #VC exception on MMIO the GHCB manual [1] section
> 4.1.5 claims that the nested page table needs an entry for MMIO
> space that has a physcial address with these reserved bits set.
> So the entry looks like this: 0x000fc00000000000.
> 
> So any GPA that is assigned to the MMIO range of physcial addresses
> (type VMM_MEM_TYPE_MMIO) will raise a nested page fault on (first)
> access.  In case of a SEV enabled guest I'd say we now have to set
> up a nested page table entry, that has those reserved bits set.
> When resuming the guest the access is retried and now yields #VC
> (and on future access, too).  And we can use that particular entry
> for any GPA from the MMIO range.
> 
> And that's what I'm trying to do with that diff.  And
> regress/sys/arch/amd64/seves_mmio/ yields a positive result (ie.
> #VC gets raised).
> 
> When I just set an entry with PROT_NONE I keep getting NP faults
> (to be handle by vmm) and not a #VC in the guest.

does this explanation make sense to you?

> [1] https://docs.amd.com/v/u/en-US/56421   
> [2] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/tuning-guides/58207-using-sev-with-amd-epyc-processors.pdf
> 
> > also, I'm not sure we can just "only" use pmap_enter anymore to put mappings
> > in the guest; I know dv was doing work in this area a few months ago WRT
> > uvm_share and its interaction with uvm_fault. do we need to use uvm_fault here?
> > if so, what would the permissions be? (dv?)
> 
> good point!  I'm not sure.  dv any advice?

any suggestions regarding the pmap_enter() chunk?  Am I missing
something there?

Take care,
HJ.


> > -ml
> > 
> > 
> > > ----------------------------------
> > > diff --git a/sys/arch/amd64/amd64/identcpu.c b/sys/arch/amd64/amd64/identcpu.c
> > > index ac4e845a1aa..86f05f1a7bf 100644
> > > --- a/sys/arch/amd64/amd64/identcpu.c
> > > +++ b/sys/arch/amd64/amd64/identcpu.c
> > > @@ -67,6 +67,8 @@ int cpuspeed;
> > >
> > >  int amd64_has_xcrypt;
> > >  int amd64_pos_cbit;	/* C bit position for SEV */
> > > +int amd64_phys_addrsz;	/* Physcial address size */
> > > +int amd64_phys_red;	/* Physcial address size reduction */
> > >  int amd64_min_noes_asid;
> > >  int has_rdrand;
> > >  int has_rdseed;
> > > @@ -676,8 +678,9 @@ identifycpu(struct cpu_info *ci)
> > >  	/* speculation control features */
> > >  	if (ci->ci_vendor == CPUV_AMD) {
> > >  		if (ci->ci_pnfeatset >= 0x80000008) {
> > > -			CPUID(0x80000008, dummy, ci->ci_feature_amdspec_ebx,
> > > -			    dummy, dummy);
> > > +			CPUID(0x80000008, ci->ci_feature_amdspec_eax,
> > > +			    ci->ci_feature_amdspec_ebx, dummy, dummy);
> > > +			amd64_phys_addrsz = ci->ci_feature_amdspec_eax & 0xff;
> > >  			pcpuid(ci, "80000008", 'b',
> > >  			    CPUID_MEMBER(ci_feature_amdspec_ebx),
> > >  			    CPUID_AMDSPEC_EBX_BITS);
> > > @@ -711,6 +714,7 @@ identifycpu(struct cpu_info *ci)
> > >  		    'd', CPUID_MEMBER(ci_feature_amdsev_edx),
> > >  		    CPUID_AMDSEV_EDX_BITS);
> > >  		amd64_pos_cbit = (ci->ci_feature_amdsev_ebx & 0x3f);
> > > +		amd64_phys_red = ((ci->ci_feature_amdsev_ebx >> 6) & 0x3f);
> > >  		amd64_min_noes_asid = ci->ci_feature_amdsev_edx;
> > >  		if (cpu_sev_guestmode && CPU_IS_PRIMARY(ci))
> > >  			printf("\n%s: SEV%s guest mode", ci->ci_dev->dv_xname,
> > > diff --git a/sys/arch/amd64/amd64/vmm_machdep.c b/sys/arch/amd64/amd64/vmm_machdep.c
> > > index abcd177a700..be6c0aea6ca 100644
> > > --- a/sys/arch/amd64/amd64/vmm_machdep.c
> > > +++ b/sys/arch/amd64/amd64/vmm_machdep.c
> > > @@ -5023,6 +5023,7 @@ int
> > >  svm_handle_np_fault(struct vcpu *vcpu)
> > >  {
> > >  	uint64_t gpa;
> > > +	paddr_t hpa;
> > >  	int gpa_memtype, ret = 0;
> > >  	struct vmcb *vmcb = (struct vmcb *)vcpu->vc_control_va;
> > >  	struct vm_exit_eptviolation *vee = &vcpu->vc_exit.vee;
> > > @@ -5039,14 +5040,25 @@ svm_handle_np_fault(struct vcpu *vcpu)
> > >  		ret = svm_fault_page(vcpu, gpa);
> > >  		break;
> > >  	case VMM_MEM_TYPE_MMIO:
> > > -		vee->vee_fault_type = VEE_FAULT_MMIO_ASSIST;
> > > -		if (ci->ci_vmm_cap.vcc_svm.svm_decode_assist) {
> > > -			vee->vee_insn_len = vmcb->v_n_bytes_fetched;
> > > -			memcpy(&vee->vee_insn_bytes, vmcb->v_guest_ins_bytes,
> > > -			    sizeof(vee->vee_insn_bytes));
> > > -			vee->vee_insn_info |= VEE_BYTES_VALID;
> > > +		if (vcpu->vc_seves) {
> > > +			vee->vee_fault_type = VEE_FAULT_HANDLED;
> > > +			/* Setup invalid mapping to raise #VC in guest */
> > > +			hpa = (1ULL << amd64_phys_red) - 1;
> > > +			hpa <<= (amd64_phys_addrsz - amd64_phys_red);
> > > +			ret = pmap_enter(vcpu->vc_parent->vm_pmap,
> > > +			    trunc_page(gpa), hpa,
> > > +			    PROT_READ | PROT_WRITE | PROT_EXEC, 0);
> > > +		} else {
> > > +			vee->vee_fault_type = VEE_FAULT_MMIO_ASSIST;
> > > +			if (ci->ci_vmm_cap.vcc_svm.svm_decode_assist) {
> > > +				vee->vee_insn_len = vmcb->v_n_bytes_fetched;
> > > +				memcpy(&vee->vee_insn_bytes,
> > > +				    vmcb->v_guest_ins_bytes,
> > > +				    sizeof(vee->vee_insn_bytes));
> > > +				vee->vee_insn_info |= VEE_BYTES_VALID;
> > > +			}
> > > +			ret = EAGAIN;
> > >  		}
> > > -		ret = EAGAIN;
> > >  		break;
> > >  	default:
> > >  		printf("%s: unknown memory type %d for GPA 0x%llx\n",
> > > diff --git a/sys/arch/amd64/include/cpu.h b/sys/arch/amd64/include/cpu.h
> > > index c7d34e634cf..2c8616dede9 100644
> > > --- a/sys/arch/amd64/include/cpu.h
> > > +++ b/sys/arch/amd64/include/cpu.h
> > > @@ -170,6 +170,7 @@ struct cpu_info {
> > >  	u_int32_t	ci_feature_sefflags_ebx;/* [I] */
> > >  	u_int32_t	ci_feature_sefflags_ecx;/* [I] */
> > >  	u_int32_t	ci_feature_sefflags_edx;/* [I] */
> > > +	u_int32_t	ci_feature_amdspec_eax;	/* [I] */
> > >  	u_int32_t	ci_feature_amdspec_ebx;	/* [I] */
> > >  	u_int32_t	ci_feature_amdsev_eax;	/* [I] */
> > >  	u_int32_t	ci_feature_amdsev_ebx;	/* [I] */
> > > @@ -418,6 +419,8 @@ void	identifycpu(struct cpu_info *);
> > >  int	cpu_amd64speed(int *);
> > >  extern int cpuspeed;
> > >  extern int amd64_pos_cbit;
> > > +extern int amd64_phys_addrsz;
> > > +extern int amd64_phys_red;
> > >  extern int amd64_min_noes_asid;
> > >
> > >  /* machdep.c */
> > >
> > 
>