Index | Thread | Search

From:
Stefan Fritsch <sf@openbsd.org>
Subject:
Re: Use acpipci in VMs
To:
Dave Voutila <dv@sisu.io>
Cc:
Mark Kettenis <mark.kettenis@xs4all.nl>, tech@openbsd.org
Date:
Mon, 31 Mar 2025 17:58:26 +0200

Download raw body.

Thread
On Mon, 31 Mar 2025, Dave Voutila wrote:

> Mark Kettenis <mark.kettenis@xs4all.nl> writes:
> 
> >> Date: Fri, 21 Mar 2025 14:24:08 +0100 (CET)
> >> From: Stefan Fritsch <sf@openbsd.org>
> >
> > Hi Stefan,
> >
> >> Hi,
> >>
> >> On Sun, 16 Mar 2025, Stefan Fritsch wrote:
> >> > there were some reports that vio on KVM/qemu sometimes panics with
> >> >
> >> >  vq_size not power of two: 65535
> >> >
> >> > but I could never reproduce it. bluhm@ now got me a test setup where the
> >> > bsd kernel is PXE booted on qemu in 440fx mode, and there it is
> >> > reproducible.
> >> >
> >> > After some debugging it seems that seabios or ipxe maps the PCI BARs at
> >> > 0x380000000000-0x380080000000 which is outside the allowed range in
> >> > pci_init_extents(). On the other hand, in 440fx mode, qemu seems to
> >> > produce ACPI 1.x tables and there is a check in acpipci_attach() that for
> >> > ACPI < 5.x, the PCI infos from _CRS are not used. OpenBSD will then
> >> > disable the BARs and when mapping them again in vio_attach(), it will
> >> > sometimes choose adresses that do not work, reads return 0xff and writes
> >> > are ignored. I guess this is becuase the address (in my case 0xbff14000)
> >> > lies outside the PCI window of the emulated chipset.
> >> >
> >> > I have put dmesg, acpi tables and other info at
> >> > https://www.sfritsch.de/~stf/vq-panic/
> >> >
> >> > Qemu in q35 mode produces ACPI 3.x tables, so it may also be affected.
> >> >
> >> > There may be three ways to fix this:
> >> >
> >> > 1) increase the allowed range for pcimem in pci_init_extents(). This is
> >> > what the diff below does.
> >> >
> >> > 2) somehow make acpipci_attach() use the ACPI infos on qemu. I have
> >> > verified that removing the version check fixes the issue. Since removing
> >> > the version check seems to break many other systems, this would have to be
> >> > a qemu specific quirk.
> >> >
> >> > 3) try to make OpenBSD reliably map the BARs somewhere where it works. Is
> >> > there a way for OpenBSD to get the info where the PCI window is without
> >> > trusting ACPI?
> >> >
> >> > I remember at least one report of this issue on i386. Any idea how to fix
> >> > it there?
> >>
> >> Mark Patruck noticed that these issues seem to be caused by some
> >> relatively recent changes in seabios.
> >>
> >> https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/message/R7FOQMMYWVX577QNIA2AKUAGOZKNJIAP/
> >> https://gitlab.com/qemu-project/seabios/-/commit/df9dd418b3b0e586cb208125094620fc7f90f23d
> >>
> >> A workaround seems to be to configure the VM with <= 3GB memory.
> >>
> >> The problem may become more wide-spread with 7.7, since we now default to
> >> virtio 1.x, which uses MMIO on qemu, compared to virtio 0.9 which uses PIO
> >> BARs. Therefore it would be nice to get a fix in before the release, if it
> >> is not too late already.
> >>
> >> The diff below uses acpipci / _CRS also with old ACPI versions if running
> >> on a hypervisor. I think the chance that it will break unrelated systems
> >> is low. It does not change behavior on vmd, where no acpi attaches at all.
> >>
> >> ok?
> >
> > I hate these VM quirks.  Why are folks still emulating hardware from
> > the 1990's when running a modern OS?
> 
> ...like vmd(8). /gasp /vomit /shudder
> 
> At least our ACPI isn't broken (because it doesn't exist).
> 
> >
> > Anyway, not much we can do about that I guess.  But maybe we can have
> > a bit more consistency?  We already have a check to enable MSI for
> > QEMU.  And since this really is a QEMU issue, maybe it would be better
> > to use a PCI_SUBSYS_ID_REG check here?

I am not sure. We have a quirk to detect old systems where ACPI cannot be 
trusted. If the HV cpuid bit is set, the system cannot be that old and it 
is reasonable not to use the broken-acpi quirk.

A diff with PCI_SUBSYS_ID_REG is attached below and works, too. I like the 
cpuid diff better. But if you want the PCI_SUBSYS_ID_REG diff, let's 
commit that so this get's fixed before release.



> >
> >> diff --git a/sys/arch/amd64/pci/acpipci.c b/sys/arch/amd64/pci/acpipci.c
> >> index 51cd1360383..2e3236772bb 100644
> >> --- a/sys/arch/amd64/pci/acpipci.c
> >> +++ b/sys/arch/amd64/pci/acpipci.c
> >> @@ -152,7 +152,7 @@ acpipci_attach(struct device *parent, struct device *self, void *aux)
> >>
> >>  	aml_parse_resource(&res, acpipci_parse_resources, sc);
> >>
> >> -	if (sc->sc_acpi->sc_major < 5) {
> >> +	if (sc->sc_acpi->sc_major < 5 && (cpu_ecxfeature & CPUIDECX_HV) == 0) {
> >>  		extent_destroy(sc->sc_ioex);
> >>  		extent_destroy(sc->sc_memex);
> >>
> >>
> >>
> 



diff --git a/sys/arch/amd64/pci/acpipci.c b/sys/arch/amd64/pci/acpipci.c
index 51cd1360383..b546747306a 100644
--- a/sys/arch/amd64/pci/acpipci.c
+++ b/sys/arch/amd64/pci/acpipci.c
@@ -153,12 +153,25 @@ acpipci_attach(struct device *parent, struct device *self, void *aux)
 	aml_parse_resource(&res, acpipci_parse_resources, sc);
 
 	if (sc->sc_acpi->sc_major < 5) {
-		extent_destroy(sc->sc_ioex);
-		extent_destroy(sc->sc_memex);
+		pcitag_t tag;
+		pcireg_t id;
 
-		pci_init_extents();
-		sc->sc_ioex =  pciio_ex;
-		sc->sc_memex = pcimem_ex;
+		tag = pci_make_tag(0, sc->sc_bus, 0, 0);
+		id = pci_conf_read(0, tag, PCI_SUBSYS_ID_REG);
+
+		/*
+		 * We don't trust ACPI versions < 5, except on qemu, which
+		 * emulates ancient hardware with ACPI 1 tables.
+		 */
+		if (PCI_VENDOR(id) != PCI_VENDOR_QUMRANET)
+		{
+			extent_destroy(sc->sc_ioex);
+			extent_destroy(sc->sc_memex);
+
+			pci_init_extents();
+			sc->sc_ioex =  pciio_ex;
+			sc->sc_memex = pcimem_ex;
+		}
 	}
 
 	printf("\n");