From: Alexander Bluhm Subject: Re: interface multiqueue timout race To: tech@openbsd.org Date: Fri, 27 Sep 2024 17:25:02 +0200 On Fri, Sep 27, 2024 at 04:38:34PM +0200, Claudio Jeker wrote: > On Fri, Sep 27, 2024 at 03:11:59PM +0200, Alexander Bluhm wrote: > > Hi, > > > > From time to time I see strange hangs or crashes in my network test > > lab. Like this one > > > > ddb{0}> trace > > db_enter() at db_enter+0x19 > > intr_handler(ffff80005bfab210,ffff80000009e780) at intr_handler+0x91 > > Xintr_ioapic_edge16_untramp() at Xintr_ioapic_edge16_untramp+0x18f > > Xspllower() at Xspllower+0x1d > > pool_multi_alloc(ffffffff827a2a48,2,ffff80005bfab504) at pool_multi_alloc+0xcb > > m_pool_alloc(ffffffff827a2a48,2,ffff80005bfab504) at m_pool_alloc+0x4b > > pool_p_alloc(ffffffff827a2a48,2,ffff80005bfab504) at pool_p_alloc+0x68 > > pool_do_get(ffffffff827a2a48,2,ffff80005bfab504) at pool_do_get+0xe5 > > pool_get(ffffffff827a2a48,2) at pool_get+0xad > > m_clget(0,2,802) at m_clget+0x1cf > > igc_get_buf(ffff8000004ff8e8,109) at igc_get_buf+0xb8 > > igc_rxfill(ffff8000004ff8e8) at igc_rxfill+0xad > > igc_rxrefill(ffff8000004ff8e8) at igc_rxrefill+0x27 > > softclock_process_tick_timeout(ffff8000004ff970,1) at softclock_process_tick_timeout+0x103 > > softclock(0) at softclock+0x11e > > softintr_dispatch(0) at softintr_dispatch+0xe6 > > Xsoftclock() at Xsoftclock+0x27 > > acpicpu_idle() at acpicpu_idle+0x131 > > sched_idle(ffffffff8277aff0) at sched_idle+0x298 > > end trace frame: 0x0, count: -19 > > > > igc_rxrefill() may be called from both, timeout or receive interrupt. > > As interrupts are per CPU and timeout can be on any CPU there should > > be some lock. Easy fix is to put a mutex around igc_rxrefill(). > > > > Note that I have fixed similar problem for em(4) a while ago. There > > splnet() was enough as it is not multi threaded. > > > > I have added receive mutex for bnxt, igc, ix, ixl as I have these > > interfaces in my lab. > > > > ok? > > Wouldn't it be better to have per-CPU timeouts so that we don't need locks > in those hot paths? Yes, but we don't have them. So I prefer a mutex over random hangs. How would we implement per-CPU timeouts? Do we need a timeout thread for every CPU? Or some softclock AST that is CPU aware? Or a task thread per-CPU? In this case it makes not much difference whether we use tasks or timeout threads. There was already the idea to unify them, but I am not sure if a common subsystem for both is a good idea. bluhm