Index | Thread | Search

From:
Atanas Vladimirov <vlado@bsdbg.net>
Subject:
Re: xhci: recover halted endpoints on USB Transaction Errors
To:
Tech <tech@openbsd.org>
Date:
Mon, 11 May 2026 12:56:47 +0300

Download raw body.

Thread
On 2026-04-19 18:37, Atanas Vladimirov wrote:
> Hi tech,
> 
> On Supermicro X10/X11 boards (tested on X10SLL-F and X11) the emulated
> USB keyboard and mouse exposed by the BMC/iKVM stop working after a
> BMC reset until the host is rebooted.
> 
> Reproducer: "Reset" button in the BMC web UI.
> 
> When the device re-appears the HID's INTR IN endpoint answers every
> poll with a USB Transaction Error:
> 
> 	xhci0: txerr? code 4	(with XHCI_DEBUG)
> 
> Per xHCI r1.1 section 4.10.2.6 a Transaction Error completion leaves
> the endpoint in the Halted state. The current xhci_event_xfer_generic()
> just sets xfer->status = USBD_IOERROR and breaks, so every subsequent
> xfer queued on the pipe is silently dropped by the halted endpoint --
> the keyboard dies for good.
> 
> The diff below does two things:
> 
>  1) Treats XHCI_CODE_TXERR / XHCI_CODE_SPLITERR like XHCI_CODE_STALL
>     and issues an async reset-ep, so the usb stack can restart the
>     pipe on a clean endpoint.
> 
>  2) Caps the number of consecutive TXERR-driven resets per pipe with
>     a small counter in struct xhci_pipe (reset on any successful or
>     short completion).  After XHCI_TXERR_RETRIES failures the pipe
>     is obviously wedged, so we complete the xfer with USBD_IOERROR
>     and call usb_needs_reattach() -- the hub explore task then
>     detaches the stuck device, resets the port and re-enumerates it.
>     On these boards the BMC has stabilised by then and the device
>     comes back in its proper topology (ATEN hub with the HID behind
>     it) and the keyboard works again without a host reboot.
> 
> Please note that I used AI to understand the problem. Tested the patch on
> two machines and it works for me. But I understand that it might be totally
> wrong and someone, more capable than me, might have a better approach.
> 
> I'll be glad to provide more details or do some extra testing.
> 
> Best wishes,
> Atanas  

Hello,

Just a kind reminder here :)

If you think that this approach is wrong/bad, just let me know and I can open 
a bug report to bugs@ with this context and ask for help there.

Best wishes,
Atanas 

> 
> Index: dev/usb/xhci.c
> ===================================================================
> --- dev/usb/xhci.c
> +++ dev/usb/xhci.c
> @@ -70,6 +70,7 @@ struct xhci_pipe {
>  	struct usbd_xfer	*pending_xfers[XHCI_MAX_XFER];
>  	struct usbd_xfer	*aborted_xfer;
>  	int			 halted;
> +	unsigned int		 txerr_count;
>  	size_t			 free_trbs;
>  	int			 skip;
>  #define TRB_PROCESSED_NO	0
> @@ -78,6 +79,8 @@ struct xhci_pipe {
>  	uint8_t			 trb_processed[XHCI_MAX_XFER];
>  };
>  
> +#define	XHCI_TXERR_RETRIES	3
> +
>  int	xhci_reset(struct xhci_softc *);
>  void	xhci_suspend(struct xhci_softc *);
>  int	xhci_intr1(struct xhci_softc *);
> @@ -953,6 +956,7 @@ xhci_event_xfer_generic(struct xhci_softc *sc, struct
>  			    usbd_xfer_isread(xfer) ?
>  			    BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
>  		xfer->status = USBD_NORMAL_COMPLETION;
> +		xp->txerr_count = 0;
>  		break;
>  	case XHCI_CODE_SHORT_XFER:
>  		/*
> @@ -977,12 +981,31 @@ xhci_event_xfer_generic(struct xhci_softc *sc, struct
>  			    usbd_xfer_isread(xfer) ?
>  			    BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
>  		xfer->status = USBD_NORMAL_COMPLETION;
> +		xp->txerr_count = 0;
>  		break;
>  	case XHCI_CODE_TXERR:
>  	case XHCI_CODE_SPLITERR:
>  		DPRINTF(("%s: txerr? code %d\n", DEVNAME(sc), code));
> -		xfer->status = USBD_IOERROR;
> -		break;
> +		/* Prevent any timeout to kick in. */
> +		timeout_del(&xfer->timeout_handle);
> +		usb_rem_task(xfer->device, &xfer->abort_task);
> +
> +		/*
> +		 * A USB Transaction Error leaves the endpoint Halted
> +		 * (xHCI r1.1 4.10.2.6); reset it.  If the endpoint
> +		 * keeps failing, ask the hub to re-enumerate the
> +		 * device rather than spinning forever.
> +		 */
> +		if (++xp->txerr_count > XHCI_TXERR_RETRIES) {
> +			xp->txerr_count = 0;
> +			xfer->status = USBD_IOERROR;
> +			usb_needs_reattach(xfer->device);
> +			break;
> +		}
> +		xp->halted = USBD_IOERROR;
> +		xp->aborted_xfer = xfer;
> +		xhci_cmd_reset_ep_async(sc, slot, dci);
> +		return (1);
>  	case XHCI_CODE_STALL:
>  	case XHCI_CODE_BABBLE:
>  		DPRINTF(("%s: babble code %d\n", DEVNAME(sc), code));
> @@ -1623,6 +1646,7 @@ xhci_pipe_init(struct xhci_softc *sc, struct usbd_pip
>  
>  	xp->free_trbs = xp->ring.ntrb;
>  	xp->halted = 0;
> +	xp->txerr_count = 0;
>  
>  	sdev->pipes[xp->dci - 1] = xp;