Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: hardware LRO support for bnxt(4)
To:
Jonathan Matthew <jonathan@d14n.org>
Cc:
tech@openbsd.org
Date:
Fri, 31 Oct 2025 11:09:55 +0100

Download raw body.

Thread
On Fri, Oct 31, 2025 at 01:11:08PM +1000, Jonathan Matthew wrote:
> > +#define BNXT_MAX_RX_SEGS	32
> 
> Using this as max_agg_segs in the vnic tpa config means it'll combine up to
> this many received TCP segments off the wire into an aggregation; does this
> mean it'll also use that many buffers off the ag ring, or does it pack
> multiple segments into each ag buffer?  I guess it packs them, because you can
> let it aggregate a lot more segments than buffers it can use in a single
> aggregation.

Yes, I believe the frames were packed because we were using "Jumbo"
placement. Quoting the header file:

	/*
	 * Jumbo: TPA Packet was placed using jumbo algorithm. This
	 * means that the first buffer will be filled with data before
	 * moving to aggregation buffers. Each aggregation buffer will
	 * be filled before moving to the next aggregation buffer.
	 */

In fact we probably don't really want to have multiple packets in a
single mbuf. I will try to find a way to avoid that.

> > -	if_rxr_init(&rx->rxr[0], 32, rx->rx_ring.ring_size - 1);
> > -	if_rxr_init(&rx->rxr[1], 32, rx->rx_ag_ring.ring_size - 1);
> > +	if_rxr_init(&rx->rxr[0], BNXT_MAX_RX_SEGS, rx->rx_ring.ring_size - 1);
> > +	if_rxr_init(&rx->rxr[1], BNXT_MAX_RX_SEGS * 4,
> > +	    rx->rx_ag_ring.ring_size - 1);
> 
> Not sure I follow this part.  The requirement to fill this much of the rx ring
> (at least at first) isn't related to LRO/TPA at all, as far as I can tell.
> It was there before we even used the aggregation ring.  Was it necessary to
> increase this to 128 to get TPA working?

Putting more buffers on the ring is needed for performance. This is what
makes tcpbench go from 500 Mbps to 9999 Mbps if TPA is enabled.

And what you said about the need to roll back the completion ring
if we do not yet have all expected completions available makes sense.
I have fixed this locally.

Still need to find out how to avoid the crashes reported by Hrvoje.
I will send a new diff when I have achieved that.

Thanks,
Stefan