From: Tom Smyth Subject: Re: iavf: panic on QEMU with i440fx chipset emulation To: YASUOKA Masahiko Cc: naito.yuichiro@gmail.com, kettenis@openbsd.org, tech@openbsd.org Date: Mon, 7 Oct 2024 03:47:13 +0100 I agree with this analysis you need to use q35 vm type inorder to have iavf support in my experience... you need a PCI-E support for iavf(4) i440 vms dont hav the support also I have seen that Iavf(4) have bad crashes . nasty bugs ( like physical host rebooting when guest with IAVF attached on PCI (through IO/MMU) was started and or stopped It would be worth your while to update the firmware on the card on the host Hype this helps Tom Smyth On Fri, 4 Oct 2024 at 09:27, YASUOKA Masahiko wrote: > > On Wed, 25 Sep 2024 11:42:02 +0900 (JST) > Yuichiro NAITO wrote: > > Hi, While I'm testing iavf(4) on Linux QEMU with i440fx chipset emulation, > > I had the following panic message on my OpenBSD current build. > > I440fx is the default chipset emulation of QEMU. I know q35 chipset > > emulation is required for msix interrupts. But if users forget to select it, > > I tested what happens on the default chipset emulation. > > I think the root problem is this i440fx situation. > > The driver assumes the device works with msix. The physical device is > connected on PCI-E bus and msix is usable of cource. Since QEMU is > doing PCI-passthrough the device to the guest, I think QEMU should > configure a PCI bus which support msxi for the guest. But actually > msix is not usable on OpenBSD guest. > > On FreeBSD, it seems to work around this problem by PCI_QUIRK_ENABLE_MSI_VM > > https://github.com/freebsd/freebsd-src/blob/release/14.1.0/sys/dev/pci/pci.c#L279-L283 > > I think this kind of quirk is needed for us. > What do you think? > > > > > ``` > > attempt to execute user address 0x0 in supervisor mode > > kernel: page fault trap, code=10 > > Stopped at 0 TID PID UID PRFLAGS PFLAGS CPU COMMAND > > * 20243 98235 0 0x14000 0x200 0 softnet0 > > locore0.S(0,ffff800000047018,fe00000000079ac2,ffffffffffffffff,ffffffffffffffff,0) at 0 > > taskq_thread(ffff800000047000,ffff800000047000,0,0,ffff800000047000,ffffffff8165acd0) at taskq_thread+0x129 [/usr/src/sys/kern/kern_task.c:451] > > end trace frame: 0x0, count: 14 > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb{0}> > > ``` > > > > It seems that NULL pointer function is called by the 'taskq_thread'. > > Locore0.S is called for the indirect function call to do RETPOLINE > > provided by the '__x86_indirect_thunk_r11' function. > > > > Who sets the NULL pointer function? > > I added the following KASSERT in the 'task_add' function. > > > > ``` > > diff --git a/sys/kern/kern_task.c b/sys/kern/kern_task.c > > index c3dcc5f48cc..d3a2bb90391 100644 > > --- a/sys/kern/kern_task.c > > +++ b/sys/kern/kern_task.c > > @@ -355,6 +355,7 @@ task_add(struct taskq *tq, struct task *w) > > if (ISSET(w->t_flags, TASK_ONQUEUE)) > > return (0); > > > > + KASSERT(w->t_func != NULL); > > mtx_enter(&tq->tq_mtx); > > if (!ISSET(w->t_flags, TASK_ONQUEUE)) { > > rv = 1; > > > > ``` > > > > And then got the following backtrace. > > > > ``` > > panic: kernel diagnostic assertion "w->t_func != NULL" failed: file "/usr/src/sys/kern/kern_task.c", line 358 > > Stopped at db_enter+0x14 [/usr/src/sys/arch/amd64/amd64/db_interface.c:437 > > ]: popq %rbp > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > * 0 0 0 0x10000 0x200 0K swapper > > db_enter(10,ffffffff83a22770,282,8,ffffffff81425b84,ffffffff83a22770) at db_enter+0x14 [/usr/src/sys/arch/amd64/amd64/db_interface.c:437] > > panic(ffffffff824295f2,ffffffff824295f2,ffff800000046ae0,0,ffff800000135278,ffffffff827a0bee) at panic+0xdd [/usr/src/sys/kern/subr_prf.c:227] > > __assert(ffffffff823dd575,ffffffff82333c7c,166,ffffffff823bd1e5,ffffffff82150ec9,ffffffff83a22830) at __assert+0x29 [/usr/src/sys/kern/subr_prf.c:0] > > task_add(ffff800000047000,ffff800000135278,7e776ef89ea6f47,ffff800000046ae0,fffffd8003a99000,3) at task_add+0xff [/usr/src/sys/kern/kern_task.c:358] > > iavf_process_arq(ffff800000135000,1,70306e848b1765c5,63,ffff800000135000,fffffd8003a97000) at iavf_process_arq+0x451 [/usr/src/sys/dev/pci/if_iavf.c:2260] > > iavf_get_vf_resources(ffff800000135000,ffff800000135000,ae29a6590f97d8d3,ffff8000001355f0,ffff800000135000,ffff800000135600) at iavf_get_vf_resources+0x1cd [/usr/src/sys/dev/pci/if_iavf.c:2543] > > iavf_attach(ffff80000004a200,ffff800000135000,ffffffff83a22a88,ffff80000004a200,58717fcf7319db74,ffff80000004a200) at iavf_attach+0x3d6 [/usr/src/sys/dev/pci/if_iavf.c:850] > > config_attach(ffff80000004a200,ffffffff82784028,ffffffff83a22a88,ffffffff81f8e8 > > f0,19817d1faa76602d,80002000) at config_attach+0x22b [/usr/src/sys/kern/subr_autoconf.c:418] > > pci_probe_device(ffff80000004a200,80002000,0,0,7190d4a2ab338ea0,0) at pci_probe_device+0x595 [/usr/src/sys/dev/pci/pci.c:574] > > pci_enumerate_bus(ffff80000004a200,0,0,ffff80000004a200,4ebe87163eefc827,ffff800000047280) at pci_enumerate_bus+0x1bc [/usr/src/sys/dev/pci/pci.c:0] > > config_attach(ffff800000047280,ffffffff82781cf0,ffffffff83a22ca8,ffffffff81e582b0,19817d1ppfaa6b1255,ffff80000004aa00) at config_attach+0x22b [/usr/src/sys/kern/subr_autoconf.c:418] > > acpipci_attach_bus(ffff800000047280,ffff80000004aa00,8b4a87a3a22af14c,0,ffff800000047280,ffffffff8278c9d8) at acpipci_attach_bus+0x1dd [/usr/src/sys/arch/amd64/pci/acpipci.c:225] > > acpipci_attach_busses(ffff800000047280,ffff800000047280,72b2f1f7da24d7b0,ffffffff83a22dc8,ffff800000047280,ffff800000095980) at acpipci_attach_busses+0x5c [/usr/src/sys/arch/amd64/pci/acpipci.c:233] > > mainbus_attach(0,ffff800000047280,0,0,43135163af729900,0) at mainbus_attach+0x1f3 [/usr/src/sys/arch/amd64/amd64/mainbus.c:253] > > end trace frame: 0xffffffff83a22ee0, count: 0 > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb{0}> > > ``` > > > > From the 'iavf_attach' function, 'iavf_get_vf_resources' is called. > > It expects the response of IAVF_VC_OP_GET_VF_RESOURCES. > > But this backtrace shows 'iavf_process_arq' function is called in the > > 'if_iavf.c:2260' line. It is in the 'iavf_process_vc_event' function. > > > > In my source code, around 'if_iavf.c:2260' is shown as follows. > > > > ``` > > 2231 static void > > 2232 iavf_process_vc_event(struct iavf_softc *sc, struct iavf_aq_desc *desc, > > 2233 struct iavf_aq_buf *buf) > > 2234 { > > 2235 struct iavf_vc_pf_event *event; > > 2236 struct ifnet *ifp = &sc->sc_ac.ac_if; > > 2237 const struct iavf_link_speed *speed; > > 2238 int link; > > 2239 > > 2240 event = buf->aqb_data; > > 2241 switch (event->event) { > > 2242 case IAVF_VC_EVENT_LINK_CHANGE: > > 2243 sc->sc_media_status = IFM_AVALID; > > 2244 sc->sc_media_active = IFM_ETHER; > > 2245 link = LINK_STATE_DOWN; > > 2246 if (event->link_status) { > > 2247 link = LINK_STATE_UP; > > 2248 sc->sc_media_status |= IFM_ACTIVE; > > 2249 > > 2250 ifp->if_baudrate = 0; > > 2251 speed = iavf_find_link_speed(sc, event->link_speed); > > 2252 if (speed != NULL) { > > 2253 sc->sc_media_active |= speed->media; > > 2254 ifp->if_baudrate = speed->baudrate; > > 2255 } > > 2256 } > > 2257 > > 2258 if (ifp->if_link_state != link) { > > 2259 ifp->if_link_state = link; > > 2260 if_link_state_change(ifp); > > 2261 } > > 2262 break; > > ``` > > > > It means that while waiting for IAVF_VC_OP_GET_VF_RESOURCES, > > IAVF_VC_OP_EVENT is returned and then changes the interface media status > > and reports it to userland via the 'if_link_state_change' function. > > > > In the 'if_link_state_change' function, '&ifp->if_linkstatetask' is > > passed to the 'task_add' function. The 'ifp' is a pointer to 'sturct ifnet' > > and the structure entity is in the driver context. So the 'if_linkstatetask' > > is zero cleared at first, and then set by the 'if_attach' function. > > > > Let's back to what the 'iavf_attach' function does. The key points are: > > > > 1. 'iavf_get_vf_resources' triggers 'if_link_state_change' function call. > > 2. 'pci_intr_map_msix' will fail with i440fx because no msix support. > > 3. 'if_attach' won't be called in case of i440fx. > > > > So, the 'if_linkstatetask' is executed with a zero cleared function pointer. > > With the q35 chipset emulation, 'pci_intr_map_msix' succeeds, and > > 'if_attach' is called before executing the 'if_linkstatetask'. > > > > Notifying the link state change is not needed before calling the > > 'if_attach' function because users won't see the interface. > > I added the 'sc_if_attached' flag and changed it to call > > 'if_link_state_change' if the flag is set. See my patch as follows. > > > > I confirmed the kernel doesn't panic and the iavf interfaces won't be > > created with i440fx chipset emulation. > > > > OK? > > > > ``` > > diff --git a/sys/dev/pci/if_iavf.c b/sys/dev/pci/if_iavf.c > > index d573d6725f4..1dd6188bda2 100644 > > --- a/sys/dev/pci/if_iavf.c > > +++ b/sys/dev/pci/if_iavf.c > > @@ -578,6 +578,7 @@ struct iavf_softc { > > uint32_t sc_major_ver; > > uint32_t sc_minor_ver; > > > > + int sc_if_attached; > > int sc_got_vf_resources; > > int sc_got_irq_map; > > uint32_t sc_vf_id; > > @@ -906,6 +907,7 @@ iavf_attach(struct device *parent, struct device *self, void *aux) > > > > if_attach_queues(ifp, iavf_nqueues(sc)); > > if_attach_iqueues(ifp, iavf_nqueues(sc)); > > + sc->sc_if_attached++; > > > > iavf_intr_enable(sc); > > > > @@ -2257,7 +2259,8 @@ iavf_process_vc_event(struct iavf_softc *sc, struct iavf_aq_desc *desc, > > > > if (ifp->if_link_state != link) { > > ifp->if_link_state = link; > > - if_link_state_change(ifp); > > + if (sc->sc_if_attached) > > + if_link_state_change(ifp); > > } > > break; > > > > ``` > > > > -- > > Yuichiro NAITO (naito.yuichiro@gmail.com) > > > > > -- Kindest regards, Tom Smyth.