Index | Thread | Search

From:
Stefan Fritsch <sf@sfritsch.de>
Subject:
Re: vio(4): recover from missed RX interrupts in vio_rxtick
To:
Renaud Allard <renaud@allard.it>
Cc:
tech@openbsd.org
Date:
Thu, 16 Apr 2026 23:16:55 +0200

Download raw body.

Thread
Hi,

sorry, I missed this mail in February.

On Sun, 22 Feb 2026, Renaud Allard wrote:
> I've been running an OpenBSD 7.8 VM on Oracle Cloud (arm64, KVM) with a

I would be interested in a dmesg from this VM with a kernel with 
VIRTIO_DEBUG defined to 1.

> vio(4) interface doing sustained 50-100 Mbps.  Every few days, the
> interface goes completely dead -- no packets in or out.  A reboot from
> the cloud console fixes it for another few days.
> 
> I traced the problem through the driver and I believe the root cause is
> that vio_rxtick doesn't poll the RX used ring the way vio_txtick polls
> the TX used ring.  If an RX interrupt gets lost (which can happen with
> EVENT_IDX -- the man page already has flag 0x2 as a workaround for
> exactly this class of bug), the RX side has no way to recover.
>
> Here's the diff, then the explanation.
> 
> Index: sys/dev/pv/if_vio.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pv/if_vio.c,v
> retrieving revision 1.78
> diff -u -p -r1.78 if_vio.c
> --- sys/dev/pv/if_vio.c	15 Jan 2026 09:06:19 -0000	1.78
> +++ sys/dev/pv/if_vio.c	22 Feb 2026 00:00:00 -0000
> @@ -1661,6 +1661,7 @@ vio_rxtick(void *arg)
>  	int i;
> 
>  	for (i = 0; i < sc->sc_nqueues; i++) {
> +		virtio_check_vq(sc->sc_virtio, sc->sc_q[i].viq_rxvq);
>  		mtx_enter(&sc->sc_q[i].viq_rxmtx);
>  		vio_populate_rx_mbufs(sc, &sc->sc_q[i]);
>  		mtx_leave(&sc->sc_q[i].viq_rxmtx);
> 
> 
> The problem
> -----------
> 
> There's an asymmetry between how the TX and RX timer handlers work.
> 
> vio_txtick calls virtio_check_vq on the TX used ring.  The comment
> above vio_tx_intr explains why:
> 
>     vio_txtick is used to make sure that mbufs are dequeued and freed
>     even if no further transfer happens.
> 
> So if a TX interrupt is lost, vio_txtick picks up the slack within a
> second.  This is the right thing to do.
> 
> vio_rxtick, on the other hand, only calls vio_populate_rx_mbufs, which
> adds new buffers to the available ring.  It never looks at the used
> ring.  If an RX interrupt is lost, nobody ever drains the completed
> packets.

The purpose of vio_rxtick is that we can recover if we run out of memory 
and there are no mbufs in the rx rings.

 
> 
> What happens when an RX interrupt is lost
> ------------------------------------------
> 
> When VIRTIO_F_RING_EVENT_IDX is negotiated (the default), the driver
> tells the host "interrupt me when the used index reaches N" by writing
> N into VQ_USED_EVENT.  The host is supposed to compare its used index
> against this value and fire an interrupt when it crosses the threshold.
> 
> The virtio spec (2.7.7.1) says this mechanism is "not reliable, as
> they are not synchronized with the device."  The vio(4) man page
> documents flag 0x2 as a workaround for hosts that get this wrong.
> Oracle Cloud's KVM on arm64 appears to be one of those hosts.

I think the "not reliable" is meant that the suppression is not reliable 
and one may still get an interrupt. It does not mean that the interrupts 
are unreliable.

We had some problems with this in the early days of event index support, 
therefore the vio(4) config flag. But I think these have been fixed at 
least on x86 for a long time. If you have problems on arm64, then maybe we 
are still missing a memory barrier (or using the wrong one) in some place. 
arm64 is less forgiving than x86 in that regard. We could also miss a 
bus_dmamap_sync(), but i would expect that this would be noticed on x86 
with SEV and bounce buffers. Another possibility is that the Oracle Cloud 
Hypervisor or KVM has a bug. Or maybe they use a NIC with virtio 
offloading and the NIC has a bug.

I have committed your diff, also because it seems to also help with 
stability on vmd. But whenever we hit this code path, network traffic will 
stall for up to one second. Therefore we should still figure out if we 
have a bug somewhere else. I will try to reproduce it on my arm64 VM on 
ampera/linux/KVM.

Cheers,
Stefan


> When the interrupt doesn't arrive, here's what happens:
> 
>   1. Completed packets pile up in the used ring.  vio_rxeof never runs,
>      so if_rxr_put never frees those ring slots.
> 
>   2. vio_rxtick fires every second and calls vio_populate_rx_mbufs.
>      But if_rxr_get says there are zero free slots -- the driver thinks
>      all the buffers are still in flight.  So nothing gets added.
> 
>   3. The host runs out of available buffers and starts dropping packets.
>      No RX means no TCP ACKs, TX dries up, the interface is dead.
> 
>   4. This loops forever.  vio_rxtick keeps firing, keeps finding zero
>      free slots, keeps doing nothing.  Only a reboot recovers.
> 
> 
> Why the fix works
> -----------------
> 
> Adding virtio_check_vq before the existing vio_populate_rx_mbufs call
> means: first check if there's unprocessed work in the used ring, drain
> it if there is, then refill the available ring.
> 
> On the normal path (interrupts working fine), virtio_check_vq sees
> vq_used_idx == vq_used->idx and returns immediately.  One DMA sync
> and one integer comparison, once a second.  No measurable cost.
> 
> On the recovery path (missed interrupt), virtio_check_vq finds stale
> completions, calls vio_rx_intr, which drains them via vio_rxeof, frees
> the ring slots, refills the available ring, and re-enables interrupts.
> Normal operation resumes within one second.
> 
> 
> Why virtio_check_vq goes before the mutex
> ------------------------------------------
> 
> virtio_check_vq calls vq->vq_done, which for RX is vio_rx_intr.
> vio_rx_intr takes viq_rxmtx internally, so it handles its own locking.
> The call has to go before the mtx_enter/vio_populate_rx_mbufs/mtx_leave
> block so that the used ring is drained first, freeing slots, and then
> vio_populate_rx_mbufs can refill them -- all in the same tick.
> 
> If it went after, vio_populate_rx_mbufs would still find zero free
> slots, do nothing, and the refill would have to wait for the next tick.
> 
> 
> Why it's safe to call without the mutex
> ----------------------------------------
> 
> Every existing caller of virtio_check_vq calls it without holding
> viq_rxmtx.  This is the established pattern throughout the driver:
> 
>   - vio_txtick calls it from timeout context, no mutex
>   - vio_queue_intr calls it from interrupt context, no mutex
>   - virtio_pci_queue_intr calls it from interrupt context, no mutex
>   - virtio_pci_shared_queue_intr, same
>   - virtio_pci_legacy_intr_mpsafe, same
> 
> The RX callback (vio_rx_intr) acquires viq_rxmtx as its first action,
> so the locking is self-contained.  The new call in vio_rxtick is
> identical in pattern to the existing call in vio_txtick -- same
> function, same context, same convention.
> 
> 
> Interrupt modes
> ---------------
> 
> I checked all four interrupt configurations to make sure the fix is
> useful in each:
> 
>   1. Child-managed MSI-X (multi-queue): TX and RX share a vector per
>      queue pair via vio_queue_intr, which checks both.  If the host
>      suppresses the interrupt entirely, neither gets checked.
> 
>   2. Per-VQ MSI-X: each VQ has its own vector.  A lost RX interrupt
>      cannot be recovered by a TX interrupt at all.
> 
>   3. Shared MSI-X: all VQs share one vector.  A TX interrupt also
>      checks RX.  But under heavy inbound-only traffic, TX interrupts
>      become infrequent.
> 
>   4. Legacy: same as shared.
> 
> In all four cases, vio_rxtick is the only guaranteed periodic check,
> and it's the only thing that can reliably recover from a lost RX
> interrupt regardless of traffic pattern.
> 
> 
> Best Regards
>