From: Mike Larkin <mlarkin@nested.page>
Subject: Re: vmm(4): return error for VMCALL instead of injecting #UD
To: Dave Voutila <dv@sisu.io>
Cc: tech@openbsd.org
Date: Tue, 17 Mar 2026 08:11:11 -0700

On Mon, Mar 16, 2026 at 04:51:21PM -0400, Dave Voutila wrote:
> hshoexer <hshoexer@yerbouti.franken.de> writes:
>
> > Hi,
> >
> > On Mon, Mar 16, 2026 at 11:26:14AM -0400, Dave Voutila wrote:
> >> Miguel Landaeta <miguel@miguel.cc> writes:
> >>
> >> > On Mon, Mar 16, 2026 at 11:51:00AM +0000, Miguel Landaeta wrote:
> >> >>
> >> >> Does this approach make sense, or is there a better way to handle this?
> >>
> >> Did this just start happening with linux guests? I haven't seen this
> >> yet. I think sf@ changed how we advertise ourselves as KVM to Linux
> >> awhile ago specifically to get Linux to properly use the kvm
> >> paravirtualized clock.
> >>
> >> Not saying we shouldn't fix this if it's broken, but confused why this
> >> hasn't popped up on my machines.
> >>
> >> >
> >> > Sending a second revision. I realized that vmm(4) should continue
> >> > to inject #UD if the hypercall isn't from ring 0 (this case was missed
> >> > in the previous patch).
> >>
> >> Is #UD correct when cpl > 0? My go-to reference for this stuff says it's
> >> #GP(0): https://www.felixcloutier.com/x86/vmcall
> >
> > good point!
> >
> >> mlarkin@ & hshoexer@ were working on improving some VMCALL handling I
> >> think. Curious their thoughts here.
> >
> > I think we have different behaviour with AMDs VMMCALL and Intels
> > VMCALL instruction:  With VMMCALL I think #UD is the right way to
> > forcefully abort VMMCALL, as VMMCALL only raises #UD.   And on Intel
> > with VMCALL it should be #GP(0) for CPL > 0.
> >
> > So regress/sys/arch/amd/vmcall needs to be adjusted, too.
> >
> >> >
> >> >
> >> > Index: sys/arch/amd64/amd64/vmm_machdep.c
> >> > ===================================================================
> >> > RCS file: /home/cvs/src/sys/arch/amd64/amd64/vmm_machdep.c,v
> >> > diff -u -p -u -p -r1.72 vmm_machdep.c
> >> > --- sys/arch/amd64/amd64/vmm_machdep.c	16 Feb 2026 15:08:41 -0000	1.72
> >> > +++ sys/arch/amd64/amd64/vmm_machdep.c	16 Mar 2026 14:00:43 -0000
> >> > @@ -4301,8 +4301,15 @@ svm_handle_exit(struct vcpu *vcpu)
> >> >  		    vcpu->vc_gueststate.vg_rax == HVCALL_FORCED_ABORT)
> >> >  			return (EINVAL);
> >> >  		DPRINTF("SVM_VMEXIT_VMMCALL at cpl=%d\n", guest_cpl);
> >> > -		ret = vmm_inject_ud(vcpu);
> >> > -		update_rip = 0;
> >> > +		if (guest_cpl != 0) {
> >> > +			ret = vmm_inject_ud(vcpu);
> >> > +			update_rip = 0;
> >> > +			break;
> >> > +		}
> >> > +		/* Ring 0 unsupported hypercall - return error */
> >> > +		vcpu->vc_gueststate.vg_rax = -1;
> >> > +		vcpu->vc_gueststate.vg_rip += 3; /* VMMCALL is 3 bytes long */
> >>
> >> vmm(4) has this rosey view of the world in places and I really think it
> >> shouldn't. Technically any x86 instruction can have zero or many
> >> repeated prefixes that are ignored so the actual length of VMCALL could
> >> be between 3 and 15 bytes inclusive.
> >
> > correct.  That's quite tricky.  I experimented with this when writing
> > regress/sys/arch/amd64/{vmcall,vmmcall}.  As I was opting for always
> > aborting VMMCALL/VMCALL the actual instruction sequence size was
> > not relevant.
> >
> > If we want to skip VMMCALL/VMCALL (or acutally implement VM calls
> > besides HVCALL_FORCED_ABORT) I think we would have to copy in up
> > to 15 bytes (and maybe consider page boundaries and EFAULT, etc.)
> > and decode the sequence.  Another option might be to only allow 0f
> > 01 c1 at RIP ("pure" vm call instruction) and abort everything else
> > (padding is used) as before.
> >
>
> I think VMX will give us a valid instruction length we can
> retrieve and trust.
>
> For SVM, I think nRIP is applicable here and should be accurate? I think
> we're at a point where we might want to enforce nRIP support if we don't
> already. (I think this has come up before.)
>

Let's just enforce nRIP requirements at this time.

This should make this diff series much easier. Either of you have time today
to work on that? LMK (or I can see if I can carve out a few hours).

-ml

> >> Would a compiler every output something like this? Probably not. But
> >> it's totally legal to just pad instructions like this! Here's a 10-byte
> >> VMCALL:
> >>
> >> $ cstool -d x64 "40 40 40 40 40 40 40 0F 01 C1"
> >>  0  40 40 40 40 40 40 40 0f 01 c1                    vmcall
> >>         ID: 1011 (vmcall)
> >>         Prefix:0x00 0x00 0x00 0x00
> >>         Opcode:0x0f 0x01 0x00 0x00
> >>         rex: 0x40
> >>         addr_size: 8
> >>         modrm: 0xc1
> >>         disp: 0x0
> >>         sib: 0x0
> >>         Groups: privilege vm
> >>
> >> > +		update_rip = 1;
> >> >  		break;
> >> >  	default:
> >> >  		DPRINTF("%s: unhandled exit 0x%llx (pa=0x%llx)\n", __func__,
> >> > @@ -4741,8 +4748,15 @@ vmx_handle_exit(struct vcpu *vcpu)
> >> >  		    vcpu->vc_gueststate.vg_rax == HVCALL_FORCED_ABORT)
> >> >  			return (EINVAL);
> >> >  		DPRINTF("VMX_EXIT_VMCALL at cpl=%d\n", guest_cpl);
> >> > -		ret = vmm_inject_ud(vcpu);
> >> > -		update_rip = 0;
> >> > +		if (guest_cpl != 0) {
> >> > +			ret = vmm_inject_ud(vcpu);
> >> > +			update_rip = 0;
> >> > +			break;
> >> > +		}
> >> > +		/* Ring 0 unsupported hypercall - return error */
> >> > +		vcpu->vc_gueststate.vg_rax = -1;
> >> > +		vcpu->vc_gueststate.vg_rip += 3; /* VMCALL is 3 bytes long */
> >>
> >> Same issue here technically.
> >>
> >> > +		update_rip = 1;
> >> >  		break;
> >> >  	default:
> >> >  #ifdef VMM_DEBUG
> >>
>