From: Claudio Jeker Subject: Re: interface multiqueue timout race To: Alexander Bluhm Cc: tech@openbsd.org Date: Fri, 27 Sep 2024 16:38:34 +0200 On Fri, Sep 27, 2024 at 03:11:59PM +0200, Alexander Bluhm wrote: > Hi, > > From time to time I see strange hangs or crashes in my network test > lab. Like this one > > ddb{0}> trace > db_enter() at db_enter+0x19 > intr_handler(ffff80005bfab210,ffff80000009e780) at intr_handler+0x91 > Xintr_ioapic_edge16_untramp() at Xintr_ioapic_edge16_untramp+0x18f > Xspllower() at Xspllower+0x1d > pool_multi_alloc(ffffffff827a2a48,2,ffff80005bfab504) at pool_multi_alloc+0xcb > m_pool_alloc(ffffffff827a2a48,2,ffff80005bfab504) at m_pool_alloc+0x4b > pool_p_alloc(ffffffff827a2a48,2,ffff80005bfab504) at pool_p_alloc+0x68 > pool_do_get(ffffffff827a2a48,2,ffff80005bfab504) at pool_do_get+0xe5 > pool_get(ffffffff827a2a48,2) at pool_get+0xad > m_clget(0,2,802) at m_clget+0x1cf > igc_get_buf(ffff8000004ff8e8,109) at igc_get_buf+0xb8 > igc_rxfill(ffff8000004ff8e8) at igc_rxfill+0xad > igc_rxrefill(ffff8000004ff8e8) at igc_rxrefill+0x27 > softclock_process_tick_timeout(ffff8000004ff970,1) at softclock_process_tick_timeout+0x103 > softclock(0) at softclock+0x11e > softintr_dispatch(0) at softintr_dispatch+0xe6 > Xsoftclock() at Xsoftclock+0x27 > acpicpu_idle() at acpicpu_idle+0x131 > sched_idle(ffffffff8277aff0) at sched_idle+0x298 > end trace frame: 0x0, count: -19 > > igc_rxrefill() may be called from both, timeout or receive interrupt. > As interrupts are per CPU and timeout can be on any CPU there should > be some lock. Easy fix is to put a mutex around igc_rxrefill(). > > Note that I have fixed similar problem for em(4) a while ago. There > splnet() was enough as it is not multi threaded. > > I have added receive mutex for bnxt, igc, ix, ixl as I have these > interfaces in my lab. > > ok? Wouldn't it be better to have per-CPU timeouts so that we don't need locks in those hot paths? -- :wq Claudio