Index | Thread | Search

From:
Yuichiro NAITO <naito.yuichiro@gmail.com>
Subject:
Re: iavf: panic on QEMU with i440fx chipset emulation
To:
yasuoka@openbsd.org
Cc:
kettenis@openbsd.org, tech@openbsd.org
Date:
Mon, 07 Oct 2024 10:22:00 +0900

Download raw body.

Thread
From: YASUOKA Masahiko <yasuoka@openbsd.org>
Subject: Re: iavf: panic on QEMU with i440fx chipset emulation
Date: Fri, 04 Oct 2024 17:19:28 +0900 (JST)

> On Wed, 25 Sep 2024 11:42:02 +0900 (JST)
> Yuichiro NAITO <naito.yuichiro@gmail.com> wrote:
>> Hi, While I'm testing iavf(4) on Linux QEMU with i440fx chipset emulation,
>> I had the following panic message on my OpenBSD current build.
>> I440fx is the default chipset emulation of QEMU. I know q35 chipset
>> emulation is required for msix interrupts. But if users forget to select it,
>> I tested what happens on the default chipset emulation. 
> 
> I think the root problem is this i440fx situation.

No, it's a separate issue.
I mean the iavf(4) panic occurs if an error happens between
'iavf_get_vf_resources' and 'if_atatch' in the 'iavf_attach' function.
Looking at the 'iavf_attach' function, there are at least 3 functions
that may cause an error, 'iavf_config_irq_map', 'pci_intr_map_msix',
and 'pci_intr_establish' functions.
The 'pci_intr_map_msix' case is 100% reproducible on the QEMU i440fx
chipset emulation. The other cases are very rare but should be fixed for
stability.

> The driver assumes the device works with msix.  The physical device is
> connected on PCI-E bus and msix is usable of cource.  Since QEMU is
> doing PCI-passthrough the device to the guest, I think QEMU should
> configure a PCI bus which support msxi for the guest.  But actually
> msix is not usable on OpenBSD guest.
> 
> On FreeBSD, it seems to work around this problem by PCI_QUIRK_ENABLE_MSI_VM
> 
>   https://github.com/freebsd/freebsd-src/blob/release/14.1.0/sys/dev/pci/pci.c#L279-L283
> 
> I think this kind of quirk is needed for us.
> What do you think?

NetBSD also has the same quirk.

https://github.com/NetBSD/src/blob/affb7619d18b17a184eede63a4823f629a2893fc/sys/arch/x86/pci/pci_machdep.c#L273

I agree with you we add the same quirk for OpenBSD.
I will send a patch for the PCI bus driver soon.

>> ```
>> attempt to execute user address 0x0 in supervisor mode
>> kernel: page fault trap, code=10
>> Stopped at      0    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
>> * 20243  98235      0     0x14000      0x200    0  softnet0
>> locore0.S(0,ffff800000047018,fe00000000079ac2,ffffffffffffffff,ffffffffffffffff,0) at 0
>> taskq_thread(ffff800000047000,ffff800000047000,0,0,ffff800000047000,ffffffff8165acd0) at taskq_thread+0x129 [/usr/src/sys/kern/kern_task.c:451]
>> end trace frame: 0x0, count: 14
>> https://www.openbsd.org/ddb.html describes the minimum info required in bug
>> reports.  Insufficient info makes it difficult to find and fix bugs.
>> ddb{0}>
>> ```
>> 
>> It seems that NULL pointer function is called by the 'taskq_thread'.
>> Locore0.S is called for the indirect function call to do RETPOLINE
>> provided by the '__x86_indirect_thunk_r11' function.
>> 
>> Who sets the NULL pointer function?
>> I added the following KASSERT in the 'task_add' function.
>> 
>> ```
>> diff --git a/sys/kern/kern_task.c b/sys/kern/kern_task.c
>> index c3dcc5f48cc..d3a2bb90391 100644
>> --- a/sys/kern/kern_task.c
>> +++ b/sys/kern/kern_task.c
>> @@ -355,6 +355,7 @@ task_add(struct taskq *tq, struct task *w)
>> 	if (ISSET(w->t_flags, TASK_ONQUEUE))
>> 		return (0);
>>  
>> +	KASSERT(w->t_func != NULL);
>> 	mtx_enter(&tq->tq_mtx);
>> 	if (!ISSET(w->t_flags, TASK_ONQUEUE)) {
>> 		rv = 1;
>> 
>> ```
>> 
>> And then got the following backtrace.
>> 
>> ```
>> panic: kernel diagnostic assertion "w->t_func != NULL" failed: file "/usr/src/sys/kern/kern_task.c", line 358
>> Stopped at      db_enter+0x14 [/usr/src/sys/arch/amd64/amd64/db_interface.c:437
>> ]:      popq    %rbp
>>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
>> *     0      0      0     0x10000      0x200    0K swapper
>> db_enter(10,ffffffff83a22770,282,8,ffffffff81425b84,ffffffff83a22770) at db_enter+0x14 [/usr/src/sys/arch/amd64/amd64/db_interface.c:437]
>> panic(ffffffff824295f2,ffffffff824295f2,ffff800000046ae0,0,ffff800000135278,ffffffff827a0bee) at panic+0xdd [/usr/src/sys/kern/subr_prf.c:227]
>> __assert(ffffffff823dd575,ffffffff82333c7c,166,ffffffff823bd1e5,ffffffff82150ec9,ffffffff83a22830) at __assert+0x29 [/usr/src/sys/kern/subr_prf.c:0]
>> task_add(ffff800000047000,ffff800000135278,7e776ef89ea6f47,ffff800000046ae0,fffffd8003a99000,3) at task_add+0xff [/usr/src/sys/kern/kern_task.c:358]
>> iavf_process_arq(ffff800000135000,1,70306e848b1765c5,63,ffff800000135000,fffffd8003a97000) at iavf_process_arq+0x451 [/usr/src/sys/dev/pci/if_iavf.c:2260]
>> iavf_get_vf_resources(ffff800000135000,ffff800000135000,ae29a6590f97d8d3,ffff8000001355f0,ffff800000135000,ffff800000135600) at iavf_get_vf_resources+0x1cd [/usr/src/sys/dev/pci/if_iavf.c:2543]
>> iavf_attach(ffff80000004a200,ffff800000135000,ffffffff83a22a88,ffff80000004a200,58717fcf7319db74,ffff80000004a200) at iavf_attach+0x3d6 [/usr/src/sys/dev/pci/if_iavf.c:850]
>> config_attach(ffff80000004a200,ffffffff82784028,ffffffff83a22a88,ffffffff81f8e8
>> f0,19817d1faa76602d,80002000) at config_attach+0x22b [/usr/src/sys/kern/subr_autoconf.c:418]
>> pci_probe_device(ffff80000004a200,80002000,0,0,7190d4a2ab338ea0,0) at pci_probe_device+0x595 [/usr/src/sys/dev/pci/pci.c:574]
>> pci_enumerate_bus(ffff80000004a200,0,0,ffff80000004a200,4ebe87163eefc827,ffff800000047280) at pci_enumerate_bus+0x1bc [/usr/src/sys/dev/pci/pci.c:0]
>> config_attach(ffff800000047280,ffffffff82781cf0,ffffffff83a22ca8,ffffffff81e582b0,19817d1ppfaa6b1255,ffff80000004aa00) at config_attach+0x22b [/usr/src/sys/kern/subr_autoconf.c:418]
>> acpipci_attach_bus(ffff800000047280,ffff80000004aa00,8b4a87a3a22af14c,0,ffff800000047280,ffffffff8278c9d8) at acpipci_attach_bus+0x1dd [/usr/src/sys/arch/amd64/pci/acpipci.c:225]
>> acpipci_attach_busses(ffff800000047280,ffff800000047280,72b2f1f7da24d7b0,ffffffff83a22dc8,ffff800000047280,ffff800000095980) at acpipci_attach_busses+0x5c [/usr/src/sys/arch/amd64/pci/acpipci.c:233]
>> mainbus_attach(0,ffff800000047280,0,0,43135163af729900,0) at mainbus_attach+0x1f3 [/usr/src/sys/arch/amd64/amd64/mainbus.c:253]
>> end trace frame: 0xffffffff83a22ee0, count: 0
>> https://www.openbsd.org/ddb.html describes the minimum info required in bug
>> reports.  Insufficient info makes it difficult to find and fix bugs.
>> ddb{0}> 
>> ```
>> 
>> From the 'iavf_attach' function, 'iavf_get_vf_resources' is called.
>> It expects the response of IAVF_VC_OP_GET_VF_RESOURCES.
>> But this backtrace shows 'iavf_process_arq' function is called in the
>> 'if_iavf.c:2260' line. It is in the 'iavf_process_vc_event' function.
>> 
>> In my source code, around 'if_iavf.c:2260' is shown as follows.
>> 
>> ```
>>  2231 static void
>>  2232 iavf_process_vc_event(struct iavf_softc *sc, struct iavf_aq_desc *desc,
>>  2233     struct iavf_aq_buf *buf)
>>  2234 {
>>  2235         struct iavf_vc_pf_event *event;
>>  2236         struct ifnet *ifp = &sc->sc_ac.ac_if;
>>  2237         const struct iavf_link_speed *speed;
>>  2238         int link;
>>  2239 
>>  2240         event = buf->aqb_data;
>>  2241         switch (event->event) {
>>  2242         case IAVF_VC_EVENT_LINK_CHANGE:
>>  2243                 sc->sc_media_status = IFM_AVALID;
>>  2244                 sc->sc_media_active = IFM_ETHER;
>>  2245                 link = LINK_STATE_DOWN;
>>  2246                 if (event->link_status) {
>>  2247                         link = LINK_STATE_UP;
>>  2248                         sc->sc_media_status |= IFM_ACTIVE;
>>  2249 
>>  2250                         ifp->if_baudrate = 0;
>>  2251                         speed = iavf_find_link_speed(sc, event->link_speed);
>>  2252                         if (speed != NULL) {
>>  2253                                 sc->sc_media_active |= speed->media;
>>  2254                                 ifp->if_baudrate = speed->baudrate;
>>  2255                         }
>>  2256                 }
>>  2257 
>>  2258                 if (ifp->if_link_state != link) {
>>  2259                         ifp->if_link_state = link;
>>  2260                         if_link_state_change(ifp);
>>  2261                 }
>>  2262                 break;
>> ```
>> 
>> It means that while waiting for IAVF_VC_OP_GET_VF_RESOURCES,
>> IAVF_VC_OP_EVENT is returned and then changes the interface media status
>> and reports it to userland via the 'if_link_state_change' function.
>> 
>> In the 'if_link_state_change' function, '&ifp->if_linkstatetask' is
>> passed to the 'task_add' function. The 'ifp' is a pointer to 'sturct ifnet'
>> and the structure entity is in the driver context. So the 'if_linkstatetask'
>> is zero cleared at first, and then set by the 'if_attach' function.
>> 
>> Let's back to what the 'iavf_attach' function does. The key points are:
>> 
>> 1. 'iavf_get_vf_resources' triggers 'if_link_state_change' function call.
>> 2. 'pci_intr_map_msix' will fail with i440fx because no msix support.
>> 3. 'if_attach' won't be called in case of i440fx.
>> 
>> So, the 'if_linkstatetask' is executed with a zero cleared function pointer.
>> With the q35 chipset emulation, 'pci_intr_map_msix' succeeds, and
>> 'if_attach' is called before executing the 'if_linkstatetask'.
>> 
>> Notifying the link state change is not needed before calling the
>> 'if_attach' function because users won't see the interface.
>> I added the 'sc_if_attached' flag and changed it to call
>> 'if_link_state_change' if the flag is set. See my patch as follows.
>> 
>> I confirmed the kernel doesn't panic and the iavf interfaces won't be
>> created with i440fx chipset emulation.
>> 
>> OK?
>> 
>> ```
>> diff --git a/sys/dev/pci/if_iavf.c b/sys/dev/pci/if_iavf.c
>> index d573d6725f4..1dd6188bda2 100644
>> --- a/sys/dev/pci/if_iavf.c
>> +++ b/sys/dev/pci/if_iavf.c
>> @@ -578,6 +578,7 @@ struct iavf_softc {
>>  	uint32_t		 sc_major_ver;
>>  	uint32_t		 sc_minor_ver;
>>  
>> +	int			 sc_if_attached;
>>  	int			 sc_got_vf_resources;
>>  	int			 sc_got_irq_map;
>>  	uint32_t		 sc_vf_id;
>> @@ -906,6 +907,7 @@ iavf_attach(struct device *parent, struct device *self, void *aux)
>>  
>>  	if_attach_queues(ifp, iavf_nqueues(sc));
>>  	if_attach_iqueues(ifp, iavf_nqueues(sc));
>> +	sc->sc_if_attached++;
>>  
>>  	iavf_intr_enable(sc);
>>  
>> @@ -2257,7 +2259,8 @@ iavf_process_vc_event(struct iavf_softc *sc, struct iavf_aq_desc *desc,
>>  
>>  		if (ifp->if_link_state != link) {
>>  			ifp->if_link_state = link;
>> -			if_link_state_change(ifp);
>> +			if (sc->sc_if_attached)
>> +				if_link_state_change(ifp);
>>  		}
>>  		break;
>>  
>> ```
>> 
>> -- 
>> Yuichiro NAITO (naito.yuichiro@gmail.com)
>> 
>> 

-- 
Yuichiro NAITO (naito.yuichiro@gmail.com)