Index | Thread | Search

From:
Alexander Bluhm <alexander.bluhm@gmx.net>
Subject:
Re: ix(4) LRO/TSO fix
To:
Hrvoje Popovski <hrvoje@srce.hr>
Cc:
Marcus Glocker <marcus@nazgul.ch>, tech@openbsd.org, Claudio Jeker <cjeker@diehard.n-r-g.com>
Date:
Thu, 15 Feb 2024 14:34:01 +0100

Download raw body.

Thread
On Thu, Feb 15, 2024 at 10:35:43AM +0100, Hrvoje Popovski wrote:
> On 15.2.2024. 10:23, Claudio Jeker wrote:
> > On Thu, Feb 15, 2024 at 07:55:06AM +0100, Marcus Glocker wrote:
> >> Hi,
> >>
> >> When we did commit the em(4) TSO diff, we got an issues reported by
> >> Hrvoje, where a combination of ix(4), em(4), and vlan(4) was creating
> >> watchdog timeouts on em(4).
> >>
> >> In the ix(4) RX/LRO code path, the driver makes a decision whether to
> >> TSO tag the received packet for forwarding.  For that, the header length
> >> gets calculated to find out about the packet payload.  Two things go
> >> wrong in this calculation:
> >>
> >> 1. The VLAN header length is included.  But the NIC cuts off the VLAN
> >>    header already.
> >>
> >> 2. Possible padding (as seen for example on very small packets) is not
> >>    considered, and counted as data payload.
> >>
> >> What happened after the em(4) TSO diff commit, is that ix(4) was
> >> receiving ACK packets with 0 bytes payload, but did calculate 2 bytes
> >> payload for it.  Hence, the packet was TSO tagged and forwarded.  Our
> >> TCP stack did re-calculate 0 bytes payload, and forwarded the packet to
> >> em(4).  There it reached the new TSO code path which was requested to
> >> offload a packet with 0 bytes payload, which makes the TX fail, and
> >> creates an watchdog timeout.
> >>
> >> The following ix(4) diff is fixing the calculation as following:
> >>
> >> 1. Don't include the VLAN header length.
> >>
> >> 2. Do calculate the real packet data payload by using the total IP
> >>    length field.  With the last ether_extract_headers() commit, the
> >>    IP length field can now be easily accessed.
> >>
> >> On the Hrvoje setup, no more em(4) watchdog timeouts can be seen with
> >> this diff.
> >>
> >> OK?
> >  
> > Accessing the packet data inside the rxeof function is not ideal.
> > Mainly we fault in DMA memory into the CPU cache and then do the same
> > again later on the CPU handling the data.
> > In an ideal world the driver would only touch the metadata but never
> > access the DMA memory so that there is no cache pollution. As usual
> > reality is far from an ideal world.
> >  
> > I wonder if other LRO capable devices will need similar changes and if in
> > that case this should be moved to the generic input processing.
> 
> I have mcx, ixl and bnxt and can put them in that test box with em
> interface. Would that be ok ?

More test coverage is always good.  Except that you have to find a
poor soul who fixes all the bugs :-)

Runnning your usual tests with more hardware variations, will give
us a better feeling when we change something in the driver
infrastructure.  So yes, put it in.