Index | Thread | Search

From:
Atanas Vladimirov <vlado@bsdbg.net>
Subject:
xhci: recover halted endpoints on USB Transaction Errors
To:
Tech <tech@openbsd.org>
Date:
Sun, 19 Apr 2026 18:37:45 +0300

Download raw body.

Thread
  • Atanas Vladimirov:

    xhci: recover halted endpoints on USB Transaction Errors

Hi tech,

On Supermicro X10/X11 boards (tested on X10SLL-F and X11) the emulated
USB keyboard and mouse exposed by the BMC/iKVM stop working after a
BMC reset until the host is rebooted.

Reproducer: "Reset" button in the BMC web UI.

When the device re-appears the HID's INTR IN endpoint answers every
poll with a USB Transaction Error:

	xhci0: txerr? code 4	(with XHCI_DEBUG)

Per xHCI r1.1 section 4.10.2.6 a Transaction Error completion leaves
the endpoint in the Halted state. The current xhci_event_xfer_generic()
just sets xfer->status = USBD_IOERROR and breaks, so every subsequent
xfer queued on the pipe is silently dropped by the halted endpoint --
the keyboard dies for good.

The diff below does two things:

 1) Treats XHCI_CODE_TXERR / XHCI_CODE_SPLITERR like XHCI_CODE_STALL
    and issues an async reset-ep, so the usb stack can restart the
    pipe on a clean endpoint.

 2) Caps the number of consecutive TXERR-driven resets per pipe with
    a small counter in struct xhci_pipe (reset on any successful or
    short completion).  After XHCI_TXERR_RETRIES failures the pipe
    is obviously wedged, so we complete the xfer with USBD_IOERROR
    and call usb_needs_reattach() -- the hub explore task then
    detaches the stuck device, resets the port and re-enumerates it.
    On these boards the BMC has stabilised by then and the device
    comes back in its proper topology (ATEN hub with the HID behind
    it) and the keyboard works again without a host reboot.

Please note that I used AI to understand the problem. Tested the patch on
two machines and it works for me. But I understand that it might be totally
wrong and someone, more capable than me, might have a better approach.

I'll be glad to provide more details or do some extra testing.

Best wishes,
Atanas  

Index: dev/usb/xhci.c
===================================================================
--- dev/usb/xhci.c
+++ dev/usb/xhci.c
@@ -70,6 +70,7 @@ struct xhci_pipe {
 	struct usbd_xfer	*pending_xfers[XHCI_MAX_XFER];
 	struct usbd_xfer	*aborted_xfer;
 	int			 halted;
+	unsigned int		 txerr_count;
 	size_t			 free_trbs;
 	int			 skip;
 #define TRB_PROCESSED_NO	0
@@ -78,6 +79,8 @@ struct xhci_pipe {
 	uint8_t			 trb_processed[XHCI_MAX_XFER];
 };
 
+#define	XHCI_TXERR_RETRIES	3
+
 int	xhci_reset(struct xhci_softc *);
 void	xhci_suspend(struct xhci_softc *);
 int	xhci_intr1(struct xhci_softc *);
@@ -953,6 +956,7 @@ xhci_event_xfer_generic(struct xhci_softc *sc, struct
 			    usbd_xfer_isread(xfer) ?
 			    BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
 		xfer->status = USBD_NORMAL_COMPLETION;
+		xp->txerr_count = 0;
 		break;
 	case XHCI_CODE_SHORT_XFER:
 		/*
@@ -977,12 +981,31 @@ xhci_event_xfer_generic(struct xhci_softc *sc, struct
 			    usbd_xfer_isread(xfer) ?
 			    BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
 		xfer->status = USBD_NORMAL_COMPLETION;
+		xp->txerr_count = 0;
 		break;
 	case XHCI_CODE_TXERR:
 	case XHCI_CODE_SPLITERR:
 		DPRINTF(("%s: txerr? code %d\n", DEVNAME(sc), code));
-		xfer->status = USBD_IOERROR;
-		break;
+		/* Prevent any timeout to kick in. */
+		timeout_del(&xfer->timeout_handle);
+		usb_rem_task(xfer->device, &xfer->abort_task);
+
+		/*
+		 * A USB Transaction Error leaves the endpoint Halted
+		 * (xHCI r1.1 4.10.2.6); reset it.  If the endpoint
+		 * keeps failing, ask the hub to re-enumerate the
+		 * device rather than spinning forever.
+		 */
+		if (++xp->txerr_count > XHCI_TXERR_RETRIES) {
+			xp->txerr_count = 0;
+			xfer->status = USBD_IOERROR;
+			usb_needs_reattach(xfer->device);
+			break;
+		}
+		xp->halted = USBD_IOERROR;
+		xp->aborted_xfer = xfer;
+		xhci_cmd_reset_ep_async(sc, slot, dci);
+		return (1);
 	case XHCI_CODE_STALL:
 	case XHCI_CODE_BABBLE:
 		DPRINTF(("%s: babble code %d\n", DEVNAME(sc), code));
@@ -1623,6 +1646,7 @@ xhci_pipe_init(struct xhci_softc *sc, struct usbd_pip
 
 	xp->free_trbs = xp->ring.ntrb;
 	xp->halted = 0;
+	xp->txerr_count = 0;
 
 	sdev->pipes[xp->dci - 1] = xp;