Index | Thread | Search

From:
Renaud Allard <renaud@allard.it>
Subject:
vio(4): recover from missed RX interrupts in vio_rxtick
To:
tech@openbsd.org
Date:
Sun, 22 Feb 2026 19:22:16 +0100

Download raw body.

Thread
  • Renaud Allard:

    vio(4): recover from missed RX interrupts in vio_rxtick

Hi,

I've been running an OpenBSD 7.8 VM on Oracle Cloud (arm64, KVM) with a
vio(4) interface doing sustained 50-100 Mbps.  Every few days, the
interface goes completely dead -- no packets in or out.  A reboot from
the cloud console fixes it for another few days.

I traced the problem through the driver and I believe the root cause is
that vio_rxtick doesn't poll the RX used ring the way vio_txtick polls
the TX used ring.  If an RX interrupt gets lost (which can happen with
EVENT_IDX -- the man page already has flag 0x2 as a workaround for
exactly this class of bug), the RX side has no way to recover.

Here's the diff, then the explanation.

Index: sys/dev/pv/if_vio.c
===================================================================
RCS file: /cvs/src/sys/dev/pv/if_vio.c,v
retrieving revision 1.78
diff -u -p -r1.78 if_vio.c
--- sys/dev/pv/if_vio.c	15 Jan 2026 09:06:19 -0000	1.78
+++ sys/dev/pv/if_vio.c	22 Feb 2026 00:00:00 -0000
@@ -1661,6 +1661,7 @@ vio_rxtick(void *arg)
  	int i;

  	for (i = 0; i < sc->sc_nqueues; i++) {
+		virtio_check_vq(sc->sc_virtio, sc->sc_q[i].viq_rxvq);
  		mtx_enter(&sc->sc_q[i].viq_rxmtx);
  		vio_populate_rx_mbufs(sc, &sc->sc_q[i]);
  		mtx_leave(&sc->sc_q[i].viq_rxmtx);


The problem
-----------

There's an asymmetry between how the TX and RX timer handlers work.

vio_txtick calls virtio_check_vq on the TX used ring.  The comment
above vio_tx_intr explains why:

     vio_txtick is used to make sure that mbufs are dequeued and freed
     even if no further transfer happens.

So if a TX interrupt is lost, vio_txtick picks up the slack within a
second.  This is the right thing to do.

vio_rxtick, on the other hand, only calls vio_populate_rx_mbufs, which
adds new buffers to the available ring.  It never looks at the used
ring.  If an RX interrupt is lost, nobody ever drains the completed
packets.


What happens when an RX interrupt is lost
------------------------------------------

When VIRTIO_F_RING_EVENT_IDX is negotiated (the default), the driver
tells the host "interrupt me when the used index reaches N" by writing
N into VQ_USED_EVENT.  The host is supposed to compare its used index
against this value and fire an interrupt when it crosses the threshold.

The virtio spec (2.7.7.1) says this mechanism is "not reliable, as
they are not synchronized with the device."  The vio(4) man page
documents flag 0x2 as a workaround for hosts that get this wrong.
Oracle Cloud's KVM on arm64 appears to be one of those hosts.

When the interrupt doesn't arrive, here's what happens:

   1. Completed packets pile up in the used ring.  vio_rxeof never runs,
      so if_rxr_put never frees those ring slots.

   2. vio_rxtick fires every second and calls vio_populate_rx_mbufs.
      But if_rxr_get says there are zero free slots -- the driver thinks
      all the buffers are still in flight.  So nothing gets added.

   3. The host runs out of available buffers and starts dropping packets.
      No RX means no TCP ACKs, TX dries up, the interface is dead.

   4. This loops forever.  vio_rxtick keeps firing, keeps finding zero
      free slots, keeps doing nothing.  Only a reboot recovers.


Why the fix works
-----------------

Adding virtio_check_vq before the existing vio_populate_rx_mbufs call
means: first check if there's unprocessed work in the used ring, drain
it if there is, then refill the available ring.

On the normal path (interrupts working fine), virtio_check_vq sees
vq_used_idx == vq_used->idx and returns immediately.  One DMA sync
and one integer comparison, once a second.  No measurable cost.

On the recovery path (missed interrupt), virtio_check_vq finds stale
completions, calls vio_rx_intr, which drains them via vio_rxeof, frees
the ring slots, refills the available ring, and re-enables interrupts.
Normal operation resumes within one second.


Why virtio_check_vq goes before the mutex
------------------------------------------

virtio_check_vq calls vq->vq_done, which for RX is vio_rx_intr.
vio_rx_intr takes viq_rxmtx internally, so it handles its own locking.
The call has to go before the mtx_enter/vio_populate_rx_mbufs/mtx_leave
block so that the used ring is drained first, freeing slots, and then
vio_populate_rx_mbufs can refill them -- all in the same tick.

If it went after, vio_populate_rx_mbufs would still find zero free
slots, do nothing, and the refill would have to wait for the next tick.


Why it's safe to call without the mutex
----------------------------------------

Every existing caller of virtio_check_vq calls it without holding
viq_rxmtx.  This is the established pattern throughout the driver:

   - vio_txtick calls it from timeout context, no mutex
   - vio_queue_intr calls it from interrupt context, no mutex
   - virtio_pci_queue_intr calls it from interrupt context, no mutex
   - virtio_pci_shared_queue_intr, same
   - virtio_pci_legacy_intr_mpsafe, same

The RX callback (vio_rx_intr) acquires viq_rxmtx as its first action,
so the locking is self-contained.  The new call in vio_rxtick is
identical in pattern to the existing call in vio_txtick -- same
function, same context, same convention.


Interrupt modes
---------------

I checked all four interrupt configurations to make sure the fix is
useful in each:

   1. Child-managed MSI-X (multi-queue): TX and RX share a vector per
      queue pair via vio_queue_intr, which checks both.  If the host
      suppresses the interrupt entirely, neither gets checked.

   2. Per-VQ MSI-X: each VQ has its own vector.  A lost RX interrupt
      cannot be recovered by a TX interrupt at all.

   3. Shared MSI-X: all VQs share one vector.  A TX interrupt also
      checks RX.  But under heavy inbound-only traffic, TX interrupts
      become infrequent.

   4. Legacy: same as shared.

In all four cases, vio_rxtick is the only guaranteed periodic check,
and it's the only thing that can reliably recover from a lost RX
interrupt regardless of traffic pattern.


Best Regards