From: Mike Larkin Subject: Re: vmm(4): return error for VMCALL instead of injecting #UD To: Dave Voutila Cc: tech@openbsd.org Date: Tue, 17 Mar 2026 08:11:11 -0700 On Mon, Mar 16, 2026 at 04:51:21PM -0400, Dave Voutila wrote: > hshoexer writes: > > > Hi, > > > > On Mon, Mar 16, 2026 at 11:26:14AM -0400, Dave Voutila wrote: > >> Miguel Landaeta writes: > >> > >> > On Mon, Mar 16, 2026 at 11:51:00AM +0000, Miguel Landaeta wrote: > >> >> > >> >> Does this approach make sense, or is there a better way to handle this? > >> > >> Did this just start happening with linux guests? I haven't seen this > >> yet. I think sf@ changed how we advertise ourselves as KVM to Linux > >> awhile ago specifically to get Linux to properly use the kvm > >> paravirtualized clock. > >> > >> Not saying we shouldn't fix this if it's broken, but confused why this > >> hasn't popped up on my machines. > >> > >> > > >> > Sending a second revision. I realized that vmm(4) should continue > >> > to inject #UD if the hypercall isn't from ring 0 (this case was missed > >> > in the previous patch). > >> > >> Is #UD correct when cpl > 0? My go-to reference for this stuff says it's > >> #GP(0): https://www.felixcloutier.com/x86/vmcall > > > > good point! > > > >> mlarkin@ & hshoexer@ were working on improving some VMCALL handling I > >> think. Curious their thoughts here. > > > > I think we have different behaviour with AMDs VMMCALL and Intels > > VMCALL instruction: With VMMCALL I think #UD is the right way to > > forcefully abort VMMCALL, as VMMCALL only raises #UD. And on Intel > > with VMCALL it should be #GP(0) for CPL > 0. > > > > So regress/sys/arch/amd/vmcall needs to be adjusted, too. > > > >> > > >> > > >> > Index: sys/arch/amd64/amd64/vmm_machdep.c > >> > =================================================================== > >> > RCS file: /home/cvs/src/sys/arch/amd64/amd64/vmm_machdep.c,v > >> > diff -u -p -u -p -r1.72 vmm_machdep.c > >> > --- sys/arch/amd64/amd64/vmm_machdep.c 16 Feb 2026 15:08:41 -0000 1.72 > >> > +++ sys/arch/amd64/amd64/vmm_machdep.c 16 Mar 2026 14:00:43 -0000 > >> > @@ -4301,8 +4301,15 @@ svm_handle_exit(struct vcpu *vcpu) > >> > vcpu->vc_gueststate.vg_rax == HVCALL_FORCED_ABORT) > >> > return (EINVAL); > >> > DPRINTF("SVM_VMEXIT_VMMCALL at cpl=%d\n", guest_cpl); > >> > - ret = vmm_inject_ud(vcpu); > >> > - update_rip = 0; > >> > + if (guest_cpl != 0) { > >> > + ret = vmm_inject_ud(vcpu); > >> > + update_rip = 0; > >> > + break; > >> > + } > >> > + /* Ring 0 unsupported hypercall - return error */ > >> > + vcpu->vc_gueststate.vg_rax = -1; > >> > + vcpu->vc_gueststate.vg_rip += 3; /* VMMCALL is 3 bytes long */ > >> > >> vmm(4) has this rosey view of the world in places and I really think it > >> shouldn't. Technically any x86 instruction can have zero or many > >> repeated prefixes that are ignored so the actual length of VMCALL could > >> be between 3 and 15 bytes inclusive. > > > > correct. That's quite tricky. I experimented with this when writing > > regress/sys/arch/amd64/{vmcall,vmmcall}. As I was opting for always > > aborting VMMCALL/VMCALL the actual instruction sequence size was > > not relevant. > > > > If we want to skip VMMCALL/VMCALL (or acutally implement VM calls > > besides HVCALL_FORCED_ABORT) I think we would have to copy in up > > to 15 bytes (and maybe consider page boundaries and EFAULT, etc.) > > and decode the sequence. Another option might be to only allow 0f > > 01 c1 at RIP ("pure" vm call instruction) and abort everything else > > (padding is used) as before. > > > > I think VMX will give us a valid instruction length we can > retrieve and trust. > > For SVM, I think nRIP is applicable here and should be accurate? I think > we're at a point where we might want to enforce nRIP support if we don't > already. (I think this has come up before.) > Let's just enforce nRIP requirements at this time. This should make this diff series much easier. Either of you have time today to work on that? LMK (or I can see if I can carve out a few hours). -ml > >> Would a compiler every output something like this? Probably not. But > >> it's totally legal to just pad instructions like this! Here's a 10-byte > >> VMCALL: > >> > >> $ cstool -d x64 "40 40 40 40 40 40 40 0F 01 C1" > >> 0 40 40 40 40 40 40 40 0f 01 c1 vmcall > >> ID: 1011 (vmcall) > >> Prefix:0x00 0x00 0x00 0x00 > >> Opcode:0x0f 0x01 0x00 0x00 > >> rex: 0x40 > >> addr_size: 8 > >> modrm: 0xc1 > >> disp: 0x0 > >> sib: 0x0 > >> Groups: privilege vm > >> > >> > + update_rip = 1; > >> > break; > >> > default: > >> > DPRINTF("%s: unhandled exit 0x%llx (pa=0x%llx)\n", __func__, > >> > @@ -4741,8 +4748,15 @@ vmx_handle_exit(struct vcpu *vcpu) > >> > vcpu->vc_gueststate.vg_rax == HVCALL_FORCED_ABORT) > >> > return (EINVAL); > >> > DPRINTF("VMX_EXIT_VMCALL at cpl=%d\n", guest_cpl); > >> > - ret = vmm_inject_ud(vcpu); > >> > - update_rip = 0; > >> > + if (guest_cpl != 0) { > >> > + ret = vmm_inject_ud(vcpu); > >> > + update_rip = 0; > >> > + break; > >> > + } > >> > + /* Ring 0 unsupported hypercall - return error */ > >> > + vcpu->vc_gueststate.vg_rax = -1; > >> > + vcpu->vc_gueststate.vg_rip += 3; /* VMCALL is 3 bytes long */ > >> > >> Same issue here technically. > >> > >> > + update_rip = 1; > >> > break; > >> > default: > >> > #ifdef VMM_DEBUG > >> >