Download raw body.
vmm(4): return error for VMCALL instead of injecting #UD
hshoexer <hshoexer@yerbouti.franken.de> writes:
> Hi,
>
> On Mon, Mar 16, 2026 at 11:26:14AM -0400, Dave Voutila wrote:
>> Miguel Landaeta <miguel@miguel.cc> writes:
>>
>> > On Mon, Mar 16, 2026 at 11:51:00AM +0000, Miguel Landaeta wrote:
>> >>
>> >> Does this approach make sense, or is there a better way to handle this?
>>
>> Did this just start happening with linux guests? I haven't seen this
>> yet. I think sf@ changed how we advertise ourselves as KVM to Linux
>> awhile ago specifically to get Linux to properly use the kvm
>> paravirtualized clock.
>>
>> Not saying we shouldn't fix this if it's broken, but confused why this
>> hasn't popped up on my machines.
>>
>> >
>> > Sending a second revision. I realized that vmm(4) should continue
>> > to inject #UD if the hypercall isn't from ring 0 (this case was missed
>> > in the previous patch).
>>
>> Is #UD correct when cpl > 0? My go-to reference for this stuff says it's
>> #GP(0): https://www.felixcloutier.com/x86/vmcall
>
> good point!
>
>> mlarkin@ & hshoexer@ were working on improving some VMCALL handling I
>> think. Curious their thoughts here.
>
> I think we have different behaviour with AMDs VMMCALL and Intels
> VMCALL instruction: With VMMCALL I think #UD is the right way to
> forcefully abort VMMCALL, as VMMCALL only raises #UD. And on Intel
> with VMCALL it should be #GP(0) for CPL > 0.
>
> So regress/sys/arch/amd/vmcall needs to be adjusted, too.
>
>> >
>> >
>> > Index: sys/arch/amd64/amd64/vmm_machdep.c
>> > ===================================================================
>> > RCS file: /home/cvs/src/sys/arch/amd64/amd64/vmm_machdep.c,v
>> > diff -u -p -u -p -r1.72 vmm_machdep.c
>> > --- sys/arch/amd64/amd64/vmm_machdep.c 16 Feb 2026 15:08:41 -0000 1.72
>> > +++ sys/arch/amd64/amd64/vmm_machdep.c 16 Mar 2026 14:00:43 -0000
>> > @@ -4301,8 +4301,15 @@ svm_handle_exit(struct vcpu *vcpu)
>> > vcpu->vc_gueststate.vg_rax == HVCALL_FORCED_ABORT)
>> > return (EINVAL);
>> > DPRINTF("SVM_VMEXIT_VMMCALL at cpl=%d\n", guest_cpl);
>> > - ret = vmm_inject_ud(vcpu);
>> > - update_rip = 0;
>> > + if (guest_cpl != 0) {
>> > + ret = vmm_inject_ud(vcpu);
>> > + update_rip = 0;
>> > + break;
>> > + }
>> > + /* Ring 0 unsupported hypercall - return error */
>> > + vcpu->vc_gueststate.vg_rax = -1;
>> > + vcpu->vc_gueststate.vg_rip += 3; /* VMMCALL is 3 bytes long */
>>
>> vmm(4) has this rosey view of the world in places and I really think it
>> shouldn't. Technically any x86 instruction can have zero or many
>> repeated prefixes that are ignored so the actual length of VMCALL could
>> be between 3 and 15 bytes inclusive.
>
> correct. That's quite tricky. I experimented with this when writing
> regress/sys/arch/amd64/{vmcall,vmmcall}. As I was opting for always
> aborting VMMCALL/VMCALL the actual instruction sequence size was
> not relevant.
>
> If we want to skip VMMCALL/VMCALL (or acutally implement VM calls
> besides HVCALL_FORCED_ABORT) I think we would have to copy in up
> to 15 bytes (and maybe consider page boundaries and EFAULT, etc.)
> and decode the sequence. Another option might be to only allow 0f
> 01 c1 at RIP ("pure" vm call instruction) and abort everything else
> (padding is used) as before.
>
I think VMX will give us a valid instruction length we can
retrieve and trust.
For SVM, I think nRIP is applicable here and should be accurate? I think
we're at a point where we might want to enforce nRIP support if we don't
already. (I think this has come up before.)
>> Would a compiler every output something like this? Probably not. But
>> it's totally legal to just pad instructions like this! Here's a 10-byte
>> VMCALL:
>>
>> $ cstool -d x64 "40 40 40 40 40 40 40 0F 01 C1"
>> 0 40 40 40 40 40 40 40 0f 01 c1 vmcall
>> ID: 1011 (vmcall)
>> Prefix:0x00 0x00 0x00 0x00
>> Opcode:0x0f 0x01 0x00 0x00
>> rex: 0x40
>> addr_size: 8
>> modrm: 0xc1
>> disp: 0x0
>> sib: 0x0
>> Groups: privilege vm
>>
>> > + update_rip = 1;
>> > break;
>> > default:
>> > DPRINTF("%s: unhandled exit 0x%llx (pa=0x%llx)\n", __func__,
>> > @@ -4741,8 +4748,15 @@ vmx_handle_exit(struct vcpu *vcpu)
>> > vcpu->vc_gueststate.vg_rax == HVCALL_FORCED_ABORT)
>> > return (EINVAL);
>> > DPRINTF("VMX_EXIT_VMCALL at cpl=%d\n", guest_cpl);
>> > - ret = vmm_inject_ud(vcpu);
>> > - update_rip = 0;
>> > + if (guest_cpl != 0) {
>> > + ret = vmm_inject_ud(vcpu);
>> > + update_rip = 0;
>> > + break;
>> > + }
>> > + /* Ring 0 unsupported hypercall - return error */
>> > + vcpu->vc_gueststate.vg_rax = -1;
>> > + vcpu->vc_gueststate.vg_rip += 3; /* VMCALL is 3 bytes long */
>>
>> Same issue here technically.
>>
>> > + update_rip = 1;
>> > break;
>> > default:
>> > #ifdef VMM_DEBUG
>>
vmm(4): return error for VMCALL instead of injecting #UD