From: Atanas Vladimirov Subject: Re: xhci: recover halted endpoints on USB Transaction Errors To: Tech Date: Mon, 11 May 2026 12:56:47 +0300 On 2026-04-19 18:37, Atanas Vladimirov wrote: > Hi tech, > > On Supermicro X10/X11 boards (tested on X10SLL-F and X11) the emulated > USB keyboard and mouse exposed by the BMC/iKVM stop working after a > BMC reset until the host is rebooted. > > Reproducer: "Reset" button in the BMC web UI. > > When the device re-appears the HID's INTR IN endpoint answers every > poll with a USB Transaction Error: > > xhci0: txerr? code 4 (with XHCI_DEBUG) > > Per xHCI r1.1 section 4.10.2.6 a Transaction Error completion leaves > the endpoint in the Halted state. The current xhci_event_xfer_generic() > just sets xfer->status = USBD_IOERROR and breaks, so every subsequent > xfer queued on the pipe is silently dropped by the halted endpoint -- > the keyboard dies for good. > > The diff below does two things: > > 1) Treats XHCI_CODE_TXERR / XHCI_CODE_SPLITERR like XHCI_CODE_STALL > and issues an async reset-ep, so the usb stack can restart the > pipe on a clean endpoint. > > 2) Caps the number of consecutive TXERR-driven resets per pipe with > a small counter in struct xhci_pipe (reset on any successful or > short completion). After XHCI_TXERR_RETRIES failures the pipe > is obviously wedged, so we complete the xfer with USBD_IOERROR > and call usb_needs_reattach() -- the hub explore task then > detaches the stuck device, resets the port and re-enumerates it. > On these boards the BMC has stabilised by then and the device > comes back in its proper topology (ATEN hub with the HID behind > it) and the keyboard works again without a host reboot. > > Please note that I used AI to understand the problem. Tested the patch on > two machines and it works for me. But I understand that it might be totally > wrong and someone, more capable than me, might have a better approach. > > I'll be glad to provide more details or do some extra testing. > > Best wishes, > Atanas Hello, Just a kind reminder here :) If you think that this approach is wrong/bad, just let me know and I can open a bug report to bugs@ with this context and ask for help there. Best wishes, Atanas > > Index: dev/usb/xhci.c > =================================================================== > --- dev/usb/xhci.c > +++ dev/usb/xhci.c > @@ -70,6 +70,7 @@ struct xhci_pipe { > struct usbd_xfer *pending_xfers[XHCI_MAX_XFER]; > struct usbd_xfer *aborted_xfer; > int halted; > + unsigned int txerr_count; > size_t free_trbs; > int skip; > #define TRB_PROCESSED_NO 0 > @@ -78,6 +79,8 @@ struct xhci_pipe { > uint8_t trb_processed[XHCI_MAX_XFER]; > }; > > +#define XHCI_TXERR_RETRIES 3 > + > int xhci_reset(struct xhci_softc *); > void xhci_suspend(struct xhci_softc *); > int xhci_intr1(struct xhci_softc *); > @@ -953,6 +956,7 @@ xhci_event_xfer_generic(struct xhci_softc *sc, struct > usbd_xfer_isread(xfer) ? > BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > xfer->status = USBD_NORMAL_COMPLETION; > + xp->txerr_count = 0; > break; > case XHCI_CODE_SHORT_XFER: > /* > @@ -977,12 +981,31 @@ xhci_event_xfer_generic(struct xhci_softc *sc, struct > usbd_xfer_isread(xfer) ? > BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE); > xfer->status = USBD_NORMAL_COMPLETION; > + xp->txerr_count = 0; > break; > case XHCI_CODE_TXERR: > case XHCI_CODE_SPLITERR: > DPRINTF(("%s: txerr? code %d\n", DEVNAME(sc), code)); > - xfer->status = USBD_IOERROR; > - break; > + /* Prevent any timeout to kick in. */ > + timeout_del(&xfer->timeout_handle); > + usb_rem_task(xfer->device, &xfer->abort_task); > + > + /* > + * A USB Transaction Error leaves the endpoint Halted > + * (xHCI r1.1 4.10.2.6); reset it. If the endpoint > + * keeps failing, ask the hub to re-enumerate the > + * device rather than spinning forever. > + */ > + if (++xp->txerr_count > XHCI_TXERR_RETRIES) { > + xp->txerr_count = 0; > + xfer->status = USBD_IOERROR; > + usb_needs_reattach(xfer->device); > + break; > + } > + xp->halted = USBD_IOERROR; > + xp->aborted_xfer = xfer; > + xhci_cmd_reset_ep_async(sc, slot, dci); > + return (1); > case XHCI_CODE_STALL: > case XHCI_CODE_BABBLE: > DPRINTF(("%s: babble code %d\n", DEVNAME(sc), code)); > @@ -1623,6 +1646,7 @@ xhci_pipe_init(struct xhci_softc *sc, struct usbd_pip > > xp->free_trbs = xp->ring.ntrb; > xp->halted = 0; > + xp->txerr_count = 0; > > sdev->pipes[xp->dci - 1] = xp;