Index | Thread | Search

From:
hshoexer <hshoexer@yerbouti.franken.de>
Subject:
Re: vmm(4): return error for VMCALL instead of injecting #UD
To:
tech@openbsd.org
Date:
Mon, 16 Mar 2026 18:53:46 +0100

Download raw body.

Thread
Hi,

On Mon, Mar 16, 2026 at 11:26:14AM -0400, Dave Voutila wrote:
> Miguel Landaeta <miguel@miguel.cc> writes:
> 
> > On Mon, Mar 16, 2026 at 11:51:00AM +0000, Miguel Landaeta wrote:
> >>
> >> Does this approach make sense, or is there a better way to handle this?
> 
> Did this just start happening with linux guests? I haven't seen this
> yet. I think sf@ changed how we advertise ourselves as KVM to Linux
> awhile ago specifically to get Linux to properly use the kvm
> paravirtualized clock.
> 
> Not saying we shouldn't fix this if it's broken, but confused why this
> hasn't popped up on my machines.
> 
> >
> > Sending a second revision. I realized that vmm(4) should continue
> > to inject #UD if the hypercall isn't from ring 0 (this case was missed
> > in the previous patch).
> 
> Is #UD correct when cpl > 0? My go-to reference for this stuff says it's
> #GP(0): https://www.felixcloutier.com/x86/vmcall

good point!

> mlarkin@ & hshoexer@ were working on improving some VMCALL handling I
> think. Curious their thoughts here.

I think we have different behaviour with AMDs VMMCALL and Intels
VMCALL instruction:  With VMMCALL I think #UD is the right way to
forcefully abort VMMCALL, as VMMCALL only raises #UD.   And on Intel
with VMCALL it should be #GP(0) for CPL > 0.

So regress/sys/arch/amd/vmcall needs to be adjusted, too.

> >
> >
> > Index: sys/arch/amd64/amd64/vmm_machdep.c
> > ===================================================================
> > RCS file: /home/cvs/src/sys/arch/amd64/amd64/vmm_machdep.c,v
> > diff -u -p -u -p -r1.72 vmm_machdep.c
> > --- sys/arch/amd64/amd64/vmm_machdep.c	16 Feb 2026 15:08:41 -0000	1.72
> > +++ sys/arch/amd64/amd64/vmm_machdep.c	16 Mar 2026 14:00:43 -0000
> > @@ -4301,8 +4301,15 @@ svm_handle_exit(struct vcpu *vcpu)
> >  		    vcpu->vc_gueststate.vg_rax == HVCALL_FORCED_ABORT)
> >  			return (EINVAL);
> >  		DPRINTF("SVM_VMEXIT_VMMCALL at cpl=%d\n", guest_cpl);
> > -		ret = vmm_inject_ud(vcpu);
> > -		update_rip = 0;
> > +		if (guest_cpl != 0) {
> > +			ret = vmm_inject_ud(vcpu);
> > +			update_rip = 0;
> > +			break;
> > +		}
> > +		/* Ring 0 unsupported hypercall - return error */
> > +		vcpu->vc_gueststate.vg_rax = -1;
> > +		vcpu->vc_gueststate.vg_rip += 3; /* VMMCALL is 3 bytes long */
> 
> vmm(4) has this rosey view of the world in places and I really think it
> shouldn't. Technically any x86 instruction can have zero or many
> repeated prefixes that are ignored so the actual length of VMCALL could
> be between 3 and 15 bytes inclusive.

correct.  That's quite tricky.  I experimented with this when writing
regress/sys/arch/amd64/{vmcall,vmmcall}.  As I was opting for always
aborting VMMCALL/VMCALL the actual instruction sequence size was
not relevant.

If we want to skip VMMCALL/VMCALL (or acutally implement VM calls
besides HVCALL_FORCED_ABORT) I think we would have to copy in up
to 15 bytes (and maybe consider page boundaries and EFAULT, etc.)
and decode the sequence.  Another option might be to only allow 0f
01 c1 at RIP ("pure" vm call instruction) and abort everything else
(padding is used) as before.

> Would a compiler every output something like this? Probably not. But
> it's totally legal to just pad instructions like this! Here's a 10-byte
> VMCALL:
> 
> $ cstool -d x64 "40 40 40 40 40 40 40 0F 01 C1"
>  0  40 40 40 40 40 40 40 0f 01 c1                    vmcall
>         ID: 1011 (vmcall)
>         Prefix:0x00 0x00 0x00 0x00
>         Opcode:0x0f 0x01 0x00 0x00
>         rex: 0x40
>         addr_size: 8
>         modrm: 0xc1
>         disp: 0x0
>         sib: 0x0
>         Groups: privilege vm
> 
> > +		update_rip = 1;
> >  		break;
> >  	default:
> >  		DPRINTF("%s: unhandled exit 0x%llx (pa=0x%llx)\n", __func__,
> > @@ -4741,8 +4748,15 @@ vmx_handle_exit(struct vcpu *vcpu)
> >  		    vcpu->vc_gueststate.vg_rax == HVCALL_FORCED_ABORT)
> >  			return (EINVAL);
> >  		DPRINTF("VMX_EXIT_VMCALL at cpl=%d\n", guest_cpl);
> > -		ret = vmm_inject_ud(vcpu);
> > -		update_rip = 0;
> > +		if (guest_cpl != 0) {
> > +			ret = vmm_inject_ud(vcpu);
> > +			update_rip = 0;
> > +			break;
> > +		}
> > +		/* Ring 0 unsupported hypercall - return error */
> > +		vcpu->vc_gueststate.vg_rax = -1;
> > +		vcpu->vc_gueststate.vg_rip += 3; /* VMCALL is 3 bytes long */
> 
> Same issue here technically.
> 
> > +		update_rip = 1;
> >  		break;
> >  	default:
> >  #ifdef VMM_DEBUG
>