Download raw body.
vmm(4): For SEV-ES enabled guest raise #VC on MMIO related #NPF
vmm(4): For SEV-ES enabled guest raise #VC on MMIO related #NPF
vmm(4): For SEV-ES enabled guest raise #VC on MMIO related #NPF
On Sun, Nov 16, 2025 at 06:17:24PM -0800, Mike Larkin wrote:
> On Thu, Nov 13, 2025 at 01:04:08PM +0100, hshoexer wrote:
> > Hi,
> >
> > for SEV-ES enabled guests MMIO should raise a #VC trap with error
> > code SVM_VMEXIT_NPF. This is required as vmm(4) can not analyze
> > the guest code causing the exit; the guest itself has to assist
> > which requires a #VC trap being raised.
> >
> > As SEV-ES guest we try to avoid MMIO by using paravirtualization.
> > Nonetheless, vmm(4) should enforce #NPF/MMIO for SEV guests; e.g.
> > to find bugs.
> >
> > Enforcing #VC on MMIO is accomplished by setting up a mapping in
> > the nested page table that has reserved bits set. The bits of the
> > physical address portion of the PTE that are affected by physical
> > address bit reduction are reserved. When these bits are set, a #VC
> > trap with error code SVM_VMEXIT_NPF is raised when the guest resumes.
> > Further MMIO to the same page will raise #VC right away.
> >
> > This approach is recommended by [1] section 4.1.5. It can be tested
> > with regress/sys/arch/amd64/seves_mmio/.
> >
> > For non-SEV-ES guests nothing changes, we forward the fault to
> > vmd(8) as before.
> >
> > [1] https://docs.amd.com/v/u/en-US/56421
> >
>
> I will need more detail as to what is going on in the diff below, before I can
> proceed.
>
> it looks like you are letting the guest map any gpa to (?) some HPA whose
> calculation is not obvious or clear. are you trying to force a bad mapping?
> why not just use PROT_NONE for the mapping and force it to some completely
> bogus HPA? that way you're sure the guest can't sneak some mappings past you?
>
> or is this what you are doing? I can't understand from the diff. is this what
> the "address reduction" part is?
yes, that's what I'm trying to do.
But let's take a step back: On amd64 physical addresses have a
maximum size of 52 bits. The top most 12 bits are reserved (and
in the PTE these 12 bits are used for eg. NX XO, etc.)
With SEV enabled the bit 51 is repurposed as crypt bit, indicating
whether the addressed memory is to be encrypted or not. The next
few bits are used for the ASID.
Now: According to [2] p. 37 the actually addressable physical
memory and the number of ASID is determind by the firmware. On my
EPYC 9124 the physical address size is reduced from 52 to 46
(indicated by cpuid 0x000001f.ebx). The bits 51:47 are considered
reserved.
Now, to raise a #VC exception on MMIO the GHCB manual [1] section
4.1.5 claims that the nested page table needs an entry for MMIO
space that has a physcial address with these reserved bits set.
So the entry looks like this: 0x000fc00000000000.
So any GPA that is assigned to the MMIO range of physcial addresses
(type VMM_MEM_TYPE_MMIO) will raise a nested page fault on (first)
access. In case of a SEV enabled guest I'd say we now have to set
up a nested page table entry, that has those reserved bits set.
When resuming the guest the access is retried and now yields #VC
(and on future access, too). And we can use that particular entry
for any GPA from the MMIO range.
And that's what I'm trying to do with that diff. And
regress/sys/arch/amd64/seves_mmio/ yields a positive result (ie.
#VC gets raised).
When I just set an entry with PROT_NONE I keep getting NP faults
(to be handle by vmm) and not a #VC in the guest.
[1] https://docs.amd.com/v/u/en-US/56421
[2] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/tuning-guides/58207-using-sev-with-amd-epyc-processors.pdf
> also, I'm not sure we can just "only" use pmap_enter anymore to put mappings
> in the guest; I know dv was doing work in this area a few months ago WRT
> uvm_share and its interaction with uvm_fault. do we need to use uvm_fault here?
> if so, what would the permissions be? (dv?)
good point! I'm not sure. dv any advice?
Thanks and take care,
HJ.
> -ml
>
>
> > ----------------------------------
> > diff --git a/sys/arch/amd64/amd64/identcpu.c b/sys/arch/amd64/amd64/identcpu.c
> > index ac4e845a1aa..86f05f1a7bf 100644
> > --- a/sys/arch/amd64/amd64/identcpu.c
> > +++ b/sys/arch/amd64/amd64/identcpu.c
> > @@ -67,6 +67,8 @@ int cpuspeed;
> >
> > int amd64_has_xcrypt;
> > int amd64_pos_cbit; /* C bit position for SEV */
> > +int amd64_phys_addrsz; /* Physcial address size */
> > +int amd64_phys_red; /* Physcial address size reduction */
> > int amd64_min_noes_asid;
> > int has_rdrand;
> > int has_rdseed;
> > @@ -676,8 +678,9 @@ identifycpu(struct cpu_info *ci)
> > /* speculation control features */
> > if (ci->ci_vendor == CPUV_AMD) {
> > if (ci->ci_pnfeatset >= 0x80000008) {
> > - CPUID(0x80000008, dummy, ci->ci_feature_amdspec_ebx,
> > - dummy, dummy);
> > + CPUID(0x80000008, ci->ci_feature_amdspec_eax,
> > + ci->ci_feature_amdspec_ebx, dummy, dummy);
> > + amd64_phys_addrsz = ci->ci_feature_amdspec_eax & 0xff;
> > pcpuid(ci, "80000008", 'b',
> > CPUID_MEMBER(ci_feature_amdspec_ebx),
> > CPUID_AMDSPEC_EBX_BITS);
> > @@ -711,6 +714,7 @@ identifycpu(struct cpu_info *ci)
> > 'd', CPUID_MEMBER(ci_feature_amdsev_edx),
> > CPUID_AMDSEV_EDX_BITS);
> > amd64_pos_cbit = (ci->ci_feature_amdsev_ebx & 0x3f);
> > + amd64_phys_red = ((ci->ci_feature_amdsev_ebx >> 6) & 0x3f);
> > amd64_min_noes_asid = ci->ci_feature_amdsev_edx;
> > if (cpu_sev_guestmode && CPU_IS_PRIMARY(ci))
> > printf("\n%s: SEV%s guest mode", ci->ci_dev->dv_xname,
> > diff --git a/sys/arch/amd64/amd64/vmm_machdep.c b/sys/arch/amd64/amd64/vmm_machdep.c
> > index abcd177a700..be6c0aea6ca 100644
> > --- a/sys/arch/amd64/amd64/vmm_machdep.c
> > +++ b/sys/arch/amd64/amd64/vmm_machdep.c
> > @@ -5023,6 +5023,7 @@ int
> > svm_handle_np_fault(struct vcpu *vcpu)
> > {
> > uint64_t gpa;
> > + paddr_t hpa;
> > int gpa_memtype, ret = 0;
> > struct vmcb *vmcb = (struct vmcb *)vcpu->vc_control_va;
> > struct vm_exit_eptviolation *vee = &vcpu->vc_exit.vee;
> > @@ -5039,14 +5040,25 @@ svm_handle_np_fault(struct vcpu *vcpu)
> > ret = svm_fault_page(vcpu, gpa);
> > break;
> > case VMM_MEM_TYPE_MMIO:
> > - vee->vee_fault_type = VEE_FAULT_MMIO_ASSIST;
> > - if (ci->ci_vmm_cap.vcc_svm.svm_decode_assist) {
> > - vee->vee_insn_len = vmcb->v_n_bytes_fetched;
> > - memcpy(&vee->vee_insn_bytes, vmcb->v_guest_ins_bytes,
> > - sizeof(vee->vee_insn_bytes));
> > - vee->vee_insn_info |= VEE_BYTES_VALID;
> > + if (vcpu->vc_seves) {
> > + vee->vee_fault_type = VEE_FAULT_HANDLED;
> > + /* Setup invalid mapping to raise #VC in guest */
> > + hpa = (1ULL << amd64_phys_red) - 1;
> > + hpa <<= (amd64_phys_addrsz - amd64_phys_red);
> > + ret = pmap_enter(vcpu->vc_parent->vm_pmap,
> > + trunc_page(gpa), hpa,
> > + PROT_READ | PROT_WRITE | PROT_EXEC, 0);
> > + } else {
> > + vee->vee_fault_type = VEE_FAULT_MMIO_ASSIST;
> > + if (ci->ci_vmm_cap.vcc_svm.svm_decode_assist) {
> > + vee->vee_insn_len = vmcb->v_n_bytes_fetched;
> > + memcpy(&vee->vee_insn_bytes,
> > + vmcb->v_guest_ins_bytes,
> > + sizeof(vee->vee_insn_bytes));
> > + vee->vee_insn_info |= VEE_BYTES_VALID;
> > + }
> > + ret = EAGAIN;
> > }
> > - ret = EAGAIN;
> > break;
> > default:
> > printf("%s: unknown memory type %d for GPA 0x%llx\n",
> > diff --git a/sys/arch/amd64/include/cpu.h b/sys/arch/amd64/include/cpu.h
> > index c7d34e634cf..2c8616dede9 100644
> > --- a/sys/arch/amd64/include/cpu.h
> > +++ b/sys/arch/amd64/include/cpu.h
> > @@ -170,6 +170,7 @@ struct cpu_info {
> > u_int32_t ci_feature_sefflags_ebx;/* [I] */
> > u_int32_t ci_feature_sefflags_ecx;/* [I] */
> > u_int32_t ci_feature_sefflags_edx;/* [I] */
> > + u_int32_t ci_feature_amdspec_eax; /* [I] */
> > u_int32_t ci_feature_amdspec_ebx; /* [I] */
> > u_int32_t ci_feature_amdsev_eax; /* [I] */
> > u_int32_t ci_feature_amdsev_ebx; /* [I] */
> > @@ -418,6 +419,8 @@ void identifycpu(struct cpu_info *);
> > int cpu_amd64speed(int *);
> > extern int cpuspeed;
> > extern int amd64_pos_cbit;
> > +extern int amd64_phys_addrsz;
> > +extern int amd64_phys_red;
> > extern int amd64_min_noes_asid;
> >
> > /* machdep.c */
> >
>
vmm(4): For SEV-ES enabled guest raise #VC on MMIO related #NPF
vmm(4): For SEV-ES enabled guest raise #VC on MMIO related #NPF
vmm(4): For SEV-ES enabled guest raise #VC on MMIO related #NPF