Index | Thread | Search

From:
Mike Larkin <mlarkin@nested.page>
Subject:
Re: vmm(4): return error for VMCALL instead of injecting #UD
To:
Dave Voutila <dv@sisu.io>
Cc:
Miguel Landaeta <miguel@miguel.cc>, tech@openbsd.org
Date:
Mon, 16 Mar 2026 08:30:40 -0700

Download raw body.

Thread
On Mon, Mar 16, 2026 at 11:26:14AM -0400, Dave Voutila wrote:
> Miguel Landaeta <miguel@miguel.cc> writes:
>
> > On Mon, Mar 16, 2026 at 11:51:00AM +0000, Miguel Landaeta wrote:
> >>
> >> Does this approach make sense, or is there a better way to handle this?
>
> Did this just start happening with linux guests? I haven't seen this
> yet. I think sf@ changed how we advertise ourselves as KVM to Linux
> awhile ago specifically to get Linux to properly use the kvm
> paravirtualized clock.
>
> Not saying we shouldn't fix this if it's broken, but confused why this
> hasn't popped up on my machines.
>
> >
> > Sending a second revision. I realized that vmm(4) should continue
> > to inject #UD if the hypercall isn't from ring 0 (this case was missed
> > in the previous patch).
>
> Is #UD correct when cpl > 0? My go-to reference for this stuff says it's
> #GP(0): https://www.felixcloutier.com/x86/vmcall
>
> mlarkin@ & hshoexer@ were working on improving some VMCALL handling I
> think. Curious their thoughts here.
>

yeah I think this is a combination of things that have recently come together:

1. recent changes for KVM identification
2. recent changes in VMCALL/VMMCALL handling
3. recent (?) changes in linux

Will take a look today and see what I can find.

-ml

> >
> >
> > Index: sys/arch/amd64/amd64/vmm_machdep.c
> > ===================================================================
> > RCS file: /home/cvs/src/sys/arch/amd64/amd64/vmm_machdep.c,v
> > diff -u -p -u -p -r1.72 vmm_machdep.c
> > --- sys/arch/amd64/amd64/vmm_machdep.c	16 Feb 2026 15:08:41 -0000	1.72
> > +++ sys/arch/amd64/amd64/vmm_machdep.c	16 Mar 2026 14:00:43 -0000
> > @@ -4301,8 +4301,15 @@ svm_handle_exit(struct vcpu *vcpu)
> >  		    vcpu->vc_gueststate.vg_rax == HVCALL_FORCED_ABORT)
> >  			return (EINVAL);
> >  		DPRINTF("SVM_VMEXIT_VMMCALL at cpl=%d\n", guest_cpl);
> > -		ret = vmm_inject_ud(vcpu);
> > -		update_rip = 0;
> > +		if (guest_cpl != 0) {
> > +			ret = vmm_inject_ud(vcpu);
> > +			update_rip = 0;
> > +			break;
> > +		}
> > +		/* Ring 0 unsupported hypercall - return error */
> > +		vcpu->vc_gueststate.vg_rax = -1;
> > +		vcpu->vc_gueststate.vg_rip += 3; /* VMMCALL is 3 bytes long */
>
> vmm(4) has this rosey view of the world in places and I really think it
> shouldn't. Technically any x86 instruction can have zero or many
> repeated prefixes that are ignored so the actual length of VMCALL could
> be between 3 and 15 bytes inclusive.
>
> Would a compiler every output something like this? Probably not. But
> it's totally legal to just pad instructions like this! Here's a 10-byte
> VMCALL:
>
> $ cstool -d x64 "40 40 40 40 40 40 40 0F 01 C1"
>  0  40 40 40 40 40 40 40 0f 01 c1                    vmcall
>         ID: 1011 (vmcall)
>         Prefix:0x00 0x00 0x00 0x00
>         Opcode:0x0f 0x01 0x00 0x00
>         rex: 0x40
>         addr_size: 8
>         modrm: 0xc1
>         disp: 0x0
>         sib: 0x0
>         Groups: privilege vm
>
> > +		update_rip = 1;
> >  		break;
> >  	default:
> >  		DPRINTF("%s: unhandled exit 0x%llx (pa=0x%llx)\n", __func__,
> > @@ -4741,8 +4748,15 @@ vmx_handle_exit(struct vcpu *vcpu)
> >  		    vcpu->vc_gueststate.vg_rax == HVCALL_FORCED_ABORT)
> >  			return (EINVAL);
> >  		DPRINTF("VMX_EXIT_VMCALL at cpl=%d\n", guest_cpl);
> > -		ret = vmm_inject_ud(vcpu);
> > -		update_rip = 0;
> > +		if (guest_cpl != 0) {
> > +			ret = vmm_inject_ud(vcpu);
> > +			update_rip = 0;
> > +			break;
> > +		}
> > +		/* Ring 0 unsupported hypercall - return error */
> > +		vcpu->vc_gueststate.vg_rax = -1;
> > +		vcpu->vc_gueststate.vg_rip += 3; /* VMCALL is 3 bytes long */
>
> Same issue here technically.
>
> > +		update_rip = 1;
> >  		break;
> >  	default:
> >  #ifdef VMM_DEBUG
>