From: Hans-Jörg Höxer <hshoexer@genua.de>
Subject: Re: [EXT] Re: SEV-ES guest: locore #VC trap handling
To: <tech@openbsd.org>
Date: Tue, 24 Jun 2025 14:12:52 +0200

Hi,

see below for some more answers.

On Fri, Jun 20, 2025 at 09:06:37PM +0200, Alexander Bluhm wrote:
> ..
> > > @@ -193,6 +194,58 @@ bi_size_ok:
> > >  	pushl	$PSL_MBO
> > >  	popfl
> > >
> > > +	/*
> > > +	 * Setup temporary #VC trap handler, in case we are running
> > > +	 * on an AMD CPU in SEV-ES guest mode.  Will be reset by
> > > +	 * init_x86_64().
> > > +	 * We are setting up two handlers:
> > > +	 *
> > > +	 * 1) locore_vc_trap32:  Triggered when we are running in
> > > +	 *    32-bit legacy mode.
> > > +	 *
> > > +	 * 2) locore_vc_trap64:  Triggered when we are running in
> > > +	 *    32-bit compatibility mode.
> > > +	 *
> > > +	 * The latter one is used by vmd(8).
> > 
> > Please clarify; *when* is this used by vmd? I believe you mean when
> > we do a direct kernel launch? If not, then why do we need the 32 bit
> > one?
> 
> There are two ways we may enter the kernel.  KMV/qemu uses special
> Tiano-Core EFI implementation.  For vmm/vmd currently we support
> direct kernel boot only.  hshoexer@ explained to me why we need
> both 32 bit methods here, but I forgot.

With direct kernel launch vmd(8) sets up compatibility mode:  This
is basically long mode (64-bit) with 32-bit code being executed in a
32-bit segment.  However, exceptions (and interrupts) are handled by
"long mode rules".  Thus we need a 64-bit IDT entry that is only allowed
to reference a long mode code segment.  Thus we need a 64-bit handler.

On Linux/KVM we seem to actually run in legacy mode, thus there we need
a 32-bit handler.

I think in the long run we want to use EFI boot on vmd/vmm anyway.
So we might be able to simplify that code in the future.

> > >  	/* XXX merge these */
> > >  	call	init_x86_64
> > >  	call	main
> > >
> > > +	/* MSR Protocol Request Codes */
> > > +#define MSRPROTO_CPUID_REQ	0x4
> > > +#define MSRPROTO_TERM_REQ	0x100
> > > +
> > > +vc_cpuid64:
> > > +	shll	$30, %eax		/* requested register */
> > > +	orl	$MSRPROTO_CPUID_REQ, %eax
> > > +	movl	%ebx, %edx		/* CPUID function */
> > > +	movl	$MSR_SEV_GHCB, %ecx
> > > +	wrmsr
> > > +	rep vmmcall
> > 
> > Out of curiousity, why is the rep prefix needed here?
> 
> I don't know.  hshoexer?

vmmcall and vmgexit are basically the same instruction.  In the APM
vol2 vmmcall is defined as "0x0f 0x01 0xd9".  vmgexit is defined as
"0xf2/0xf3 0x0f 0x01 0xd9".  With 0xf2 and 0xf3 being repne and rep
prefixes.  As we do not have the vmgexit mnemonic in llvm yet, using
"rep vmmcall" will produce the correct byte sequence.  I think I saw
this in linux kernel code.

Another option might be to use .byte.  And of cource, adding the vmgexit
mnemonic to llvm.

As far as I understand APM vol2 the only difference between vmmcall and
vmgexit is, that vmmcall will raise #UD when the hypervisor does not
have configured the vmmcall intercept.  And the guest will not exit.
vmgexit will always exit the guest when SEV-ES is enabled.  When SEV-ES
is not enabled -- or the CPU does not support SEV-ES -- vmgexit behaves
like vmmcall.

Take care,
Hans-Joerg