From: Claudio Jeker <cjeker@diehard.n-r-g.com>
Subject: Re: interface multiqueue timout race
To: Alexander Bluhm <bluhm@openbsd.org>
Cc: tech@openbsd.org
Date: Fri, 27 Sep 2024 16:38:34 +0200

On Fri, Sep 27, 2024 at 03:11:59PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> From time to time I see strange hangs or crashes in my network test
> lab.  Like this one
> 
> ddb{0}> trace
> db_enter() at db_enter+0x19
> intr_handler(ffff80005bfab210,ffff80000009e780) at intr_handler+0x91
> Xintr_ioapic_edge16_untramp() at Xintr_ioapic_edge16_untramp+0x18f
> Xspllower() at Xspllower+0x1d
> pool_multi_alloc(ffffffff827a2a48,2,ffff80005bfab504) at pool_multi_alloc+0xcb
> m_pool_alloc(ffffffff827a2a48,2,ffff80005bfab504) at m_pool_alloc+0x4b
> pool_p_alloc(ffffffff827a2a48,2,ffff80005bfab504) at pool_p_alloc+0x68
> pool_do_get(ffffffff827a2a48,2,ffff80005bfab504) at pool_do_get+0xe5
> pool_get(ffffffff827a2a48,2) at pool_get+0xad
> m_clget(0,2,802) at m_clget+0x1cf
> igc_get_buf(ffff8000004ff8e8,109) at igc_get_buf+0xb8
> igc_rxfill(ffff8000004ff8e8) at igc_rxfill+0xad
> igc_rxrefill(ffff8000004ff8e8) at igc_rxrefill+0x27
> softclock_process_tick_timeout(ffff8000004ff970,1) at softclock_process_tick_timeout+0x103
> softclock(0) at softclock+0x11e
> softintr_dispatch(0) at softintr_dispatch+0xe6
> Xsoftclock() at Xsoftclock+0x27
> acpicpu_idle() at acpicpu_idle+0x131
> sched_idle(ffffffff8277aff0) at sched_idle+0x298
> end trace frame: 0x0, count: -19
> 
> igc_rxrefill() may be called from both, timeout or receive interrupt.
> As interrupts are per CPU and timeout can be on any CPU there should
> be some lock.  Easy fix is to put a mutex around igc_rxrefill().
> 
> Note that I have fixed similar problem for em(4) a while ago.  There
> splnet() was enough as it is not multi threaded.
> 
> I have added receive mutex for bnxt, igc, ix, ixl as I have these
> interfaces in my lab.
> 
> ok?

Wouldn't it be better to have per-CPU timeouts so that we don't need locks
in those hot paths?

-- 
:wq Claudio