Index | Thread | Search

From:
Moritz Buhl <mbuhl@openbsd.org>
Subject:
Re: vio(4): recover from missed RX interrupts in vio_rxtick
To:
tech@openbsd.org
Date:
Wed, 15 Apr 2026 04:58:31 +0200

Download raw body.

Thread
On my Intel(R) Atom(TM) CPU C3558 with 4 cores and 14 vms I feel
like this has a positive impact on the vms network interfaces.
I am running this diff on all vms since end of February and prior
to this diff I would see one vm (vio interface) lose connection
every few days.

I still see some races when starting and stopping vms, especially
after a physical reboot (vionet_rx: driver not ready) but once the
interface is up, it stays working with this.

On Sun, Feb 22, 2026 at 07:22:16PM +0100, Renaud Allard wrote:
> Hi,
> 
> I've been running an OpenBSD 7.8 VM on Oracle Cloud (arm64, KVM) with a
> vio(4) interface doing sustained 50-100 Mbps.  Every few days, the
> interface goes completely dead -- no packets in or out.  A reboot from
> the cloud console fixes it for another few days.
> 
> I traced the problem through the driver and I believe the root cause is
> that vio_rxtick doesn't poll the RX used ring the way vio_txtick polls
> the TX used ring.  If an RX interrupt gets lost (which can happen with
> EVENT_IDX -- the man page already has flag 0x2 as a workaround for
> exactly this class of bug), the RX side has no way to recover.
> 
> Here's the diff, then the explanation.
> 
> Index: sys/dev/pv/if_vio.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pv/if_vio.c,v
> retrieving revision 1.78
> diff -u -p -r1.78 if_vio.c
> --- sys/dev/pv/if_vio.c	15 Jan 2026 09:06:19 -0000	1.78
> +++ sys/dev/pv/if_vio.c	22 Feb 2026 00:00:00 -0000
> @@ -1661,6 +1661,7 @@ vio_rxtick(void *arg)
>  	int i;
> 
>  	for (i = 0; i < sc->sc_nqueues; i++) {
> +		virtio_check_vq(sc->sc_virtio, sc->sc_q[i].viq_rxvq);
>  		mtx_enter(&sc->sc_q[i].viq_rxmtx);
>  		vio_populate_rx_mbufs(sc, &sc->sc_q[i]);
>  		mtx_leave(&sc->sc_q[i].viq_rxmtx);
> 
> 
> The problem
> -----------
> 
> There's an asymmetry between how the TX and RX timer handlers work.
> 
> vio_txtick calls virtio_check_vq on the TX used ring.  The comment
> above vio_tx_intr explains why:
> 
>     vio_txtick is used to make sure that mbufs are dequeued and freed
>     even if no further transfer happens.
> 
> So if a TX interrupt is lost, vio_txtick picks up the slack within a
> second.  This is the right thing to do.
> 
> vio_rxtick, on the other hand, only calls vio_populate_rx_mbufs, which
> adds new buffers to the available ring.  It never looks at the used
> ring.  If an RX interrupt is lost, nobody ever drains the completed
> packets.
> 
> 
> What happens when an RX interrupt is lost
> ------------------------------------------
> 
> When VIRTIO_F_RING_EVENT_IDX is negotiated (the default), the driver
> tells the host "interrupt me when the used index reaches N" by writing
> N into VQ_USED_EVENT.  The host is supposed to compare its used index
> against this value and fire an interrupt when it crosses the threshold.
> 
> The virtio spec (2.7.7.1) says this mechanism is "not reliable, as
> they are not synchronized with the device."  The vio(4) man page
> documents flag 0x2 as a workaround for hosts that get this wrong.
> Oracle Cloud's KVM on arm64 appears to be one of those hosts.
> 
> When the interrupt doesn't arrive, here's what happens:
> 
>   1. Completed packets pile up in the used ring.  vio_rxeof never runs,
>      so if_rxr_put never frees those ring slots.
> 
>   2. vio_rxtick fires every second and calls vio_populate_rx_mbufs.
>      But if_rxr_get says there are zero free slots -- the driver thinks
>      all the buffers are still in flight.  So nothing gets added.
> 
>   3. The host runs out of available buffers and starts dropping packets.
>      No RX means no TCP ACKs, TX dries up, the interface is dead.
> 
>   4. This loops forever.  vio_rxtick keeps firing, keeps finding zero
>      free slots, keeps doing nothing.  Only a reboot recovers.
> 
> 
> Why the fix works
> -----------------
> 
> Adding virtio_check_vq before the existing vio_populate_rx_mbufs call
> means: first check if there's unprocessed work in the used ring, drain
> it if there is, then refill the available ring.
> 
> On the normal path (interrupts working fine), virtio_check_vq sees
> vq_used_idx == vq_used->idx and returns immediately.  One DMA sync
> and one integer comparison, once a second.  No measurable cost.
> 
> On the recovery path (missed interrupt), virtio_check_vq finds stale
> completions, calls vio_rx_intr, which drains them via vio_rxeof, frees
> the ring slots, refills the available ring, and re-enables interrupts.
> Normal operation resumes within one second.
> 
> 
> Why virtio_check_vq goes before the mutex
> ------------------------------------------
> 
> virtio_check_vq calls vq->vq_done, which for RX is vio_rx_intr.
> vio_rx_intr takes viq_rxmtx internally, so it handles its own locking.
> The call has to go before the mtx_enter/vio_populate_rx_mbufs/mtx_leave
> block so that the used ring is drained first, freeing slots, and then
> vio_populate_rx_mbufs can refill them -- all in the same tick.
> 
> If it went after, vio_populate_rx_mbufs would still find zero free
> slots, do nothing, and the refill would have to wait for the next tick.
> 
> 
> Why it's safe to call without the mutex
> ----------------------------------------
> 
> Every existing caller of virtio_check_vq calls it without holding
> viq_rxmtx.  This is the established pattern throughout the driver:
> 
>   - vio_txtick calls it from timeout context, no mutex
>   - vio_queue_intr calls it from interrupt context, no mutex
>   - virtio_pci_queue_intr calls it from interrupt context, no mutex
>   - virtio_pci_shared_queue_intr, same
>   - virtio_pci_legacy_intr_mpsafe, same
> 
> The RX callback (vio_rx_intr) acquires viq_rxmtx as its first action,
> so the locking is self-contained.  The new call in vio_rxtick is
> identical in pattern to the existing call in vio_txtick -- same
> function, same context, same convention.
> 
> 
> Interrupt modes
> ---------------
> 
> I checked all four interrupt configurations to make sure the fix is
> useful in each:
> 
>   1. Child-managed MSI-X (multi-queue): TX and RX share a vector per
>      queue pair via vio_queue_intr, which checks both.  If the host
>      suppresses the interrupt entirely, neither gets checked.
> 
>   2. Per-VQ MSI-X: each VQ has its own vector.  A lost RX interrupt
>      cannot be recovered by a TX interrupt at all.
> 
>   3. Shared MSI-X: all VQs share one vector.  A TX interrupt also
>      checks RX.  But under heavy inbound-only traffic, TX interrupts
>      become infrequent.
> 
>   4. Legacy: same as shared.
> 
> In all four cases, vio_rxtick is the only guaranteed periodic check,
> and it's the only thing that can reliably recover from a lost RX
> interrupt regardless of traffic pattern.
> 
> 
> Best Regards