From: Stefan Fritsch <sf@openbsd.org>
Subject: Re: virtio_pci change for SEV
To: Mark Kettenis <mark.kettenis@xs4all.nl>
Cc: tech@openbsd.org
Date: Wed, 9 Oct 2024 11:25:45 +0200

On Tue, 8 Oct 2024, Mark Kettenis wrote:

> > Date: Tue, 8 Oct 2024 09:14:29 +0200 (CEST)
> > From: Stefan Fritsch <sf@openbsd.org>
> > 
> > Hi,
> > 
> > we always use the standard pci busdma machinery. Therefore we should
> > negotiate the VIRTIO_F_ACCESS_PLATFORM feature if it is offered.
> > Strictly speaking we should bypass any existing iommus if the host 
> > does not offer this feature, but this no regression and we can do that 
> > when/if we introduce our own busdma tag for virtio.
> > 
> > Accepting VIRTIO_F_ACCESS_PLATFORM is required for SEV on KVM/qemu.
> > 
> > ok?
> 
> Sorry, but your explanation makes no sense.  What does "the standard
> pci busdma machinery" mean?  I think it means different things on
> different architectures.
> 
> The description of the VIRTIO_F_ACCESS_PLATFORM flag is very vague.  I
> think the only possible interpretation is that not setting the flag
> means "we're Linux and you can make assumptions and optimize things".
> So indeed setting that flag uncodinitionally may make sense.  But we
> need a better comment.

Yes, the description is rather vague. I interpret it that having it 
enabled means we should do like we would do on real hardware WRT address 
translation / IOMMUs / cache flushes, while with it disabled we should 
always use physical addresses in the rings, even if there seems to be an 
iommu. For the IOMMU part, I know that this is how qemu/KVM work in 
practice.

What "like on real hardware" means exactly is of course architecture 
dependent, but so is the meaning of the feature flag.

There is also VIRTIO_F_ORDER_PLATFORM with an equally vague description. I 
think we should also set this flag as we don't use any optimized weaker 
memory barriers. But in practice, this seems to be only relevant for 
virtio implemented in hardware/FPGAs like on some Mellanox NICs.

Updated diff below that sets both flags and with more verbose 
descriptions.

ok?

Cheers,
Stefan

commit 3a4e0ac06eec036ec17616b88716701f3cf641cc
Author: Stefan Fritsch <sf@sfritsch.de>
Date:   Mon Sep 16 16:02:35 2024 +0200

    virtio_pci: Negotiate ACCESS_PLATTFORM/ORDER_PLATFORM features
    
    We always use the standard pci access mechanisms as we would on real
    hardware. Therefore we should negotiate the VIRTIO_F_ACCESS_PLATFORM
    feature if it is offered.  Strictly speaking we should bypass any
    existing iommus if the host does not offer this feature, but this no
    regression and is left for later.
    
    Also, we use the memory ordering / barriers as provided by
    bus_dmamap_sync for real hardware. Therefore, we should negotiate the
    VIRTIO_F_ORDER_PLATFORM feature.
    
    Accepting VIRTIO_F_ACCESS_PLATFORM is required for SEV on KVM/qemu.

diff --git a/sys/dev/pci/virtio_pci.c b/sys/dev/pci/virtio_pci.c
index f9c8801ceb7..0302794f312 100644
--- a/sys/dev/pci/virtio_pci.c
+++ b/sys/dev/pci/virtio_pci.c
@@ -821,6 +821,24 @@ virtio_pci_negotiate_features_10(struct virtio_softc *vsc,
 	uint64_t host, negotiated;
 
 	vsc->sc_driver_features |= VIRTIO_F_VERSION_1;
+	/*
+	 * We always use the normal platform specific pci busdma API.
+	 * Therefore we use the access mechanisms and translations as
+	 * defined by the platform we are running on (as if it were
+	 * real hardware).
+	 *
+	 * XXX If the hypervisor does not offer VIRTIO_F_ACCESS_PLATFORM
+	 * XXX we should bypass any iommu that is present.
+	 */
+	vsc->sc_driver_features |= VIRTIO_F_ACCESS_PLATFORM;
+	/*
+	 * We always use pci specific memory ordering as implemented by
+	 * the bus_dmamap_sync().
+	 *
+	 * XXX If the hypervisor does not offer VIRTIO_F_ORDER_PLATFORM,
+	 * XXX we could use weaker barriers.
+	 */
+	vsc->sc_driver_features |= VIRTIO_F_ORDER_PLATFORM;
 	/* notify on empty is 0.9 only */
 	vsc->sc_driver_features &= ~VIRTIO_F_NOTIFY_ON_EMPTY;
 	CWRITE(sc, device_feature_select, 0);