Index | Thread | Search

From:
Hrvoje Popovski <hrvoje@srce.hr>
Subject:
Re: ix(4) LRO/TSO fix
To:
Marcus Glocker <marcus@nazgul.ch>, tech@openbsd.org, Claudio Jeker <cjeker@diehard.n-r-g.com>
Date:
Thu, 15 Feb 2024 10:35:43 +0100

Download raw body.

Thread
On 15.2.2024. 10:23, Claudio Jeker wrote:
> On Thu, Feb 15, 2024 at 07:55:06AM +0100, Marcus Glocker wrote:
>> Hi,
>>
>> When we did commit the em(4) TSO diff, we got an issues reported by
>> Hrvoje, where a combination of ix(4), em(4), and vlan(4) was creating
>> watchdog timeouts on em(4).
>>
>> In the ix(4) RX/LRO code path, the driver makes a decision whether to
>> TSO tag the received packet for forwarding.  For that, the header length
>> gets calculated to find out about the packet payload.  Two things go
>> wrong in this calculation:
>>
>> 1. The VLAN header length is included.  But the NIC cuts off the VLAN
>>    header already.
>>
>> 2. Possible padding (as seen for example on very small packets) is not
>>    considered, and counted as data payload.
>>
>> What happened after the em(4) TSO diff commit, is that ix(4) was
>> receiving ACK packets with 0 bytes payload, but did calculate 2 bytes
>> payload for it.  Hence, the packet was TSO tagged and forwarded.  Our
>> TCP stack did re-calculate 0 bytes payload, and forwarded the packet to
>> em(4).  There it reached the new TSO code path which was requested to
>> offload a packet with 0 bytes payload, which makes the TX fail, and
>> creates an watchdog timeout.
>>
>> The following ix(4) diff is fixing the calculation as following:
>>
>> 1. Don't include the VLAN header length.
>>
>> 2. Do calculate the real packet data payload by using the total IP
>>    length field.  With the last ether_extract_headers() commit, the
>>    IP length field can now be easily accessed.
>>
>> On the Hrvoje setup, no more em(4) watchdog timeouts can be seen with
>> this diff.
>>
>> OK?
>  
> Accessing the packet data inside the rxeof function is not ideal.
> Mainly we fault in DMA memory into the CPU cache and then do the same
> again later on the CPU handling the data.
> In an ideal world the driver would only touch the metadata but never
> access the DMA memory so that there is no cache pollution. As usual
> reality is far from an ideal world.
>  
> I wonder if other LRO capable devices will need similar changes and if in
> that case this should be moved to the generic input processing.

I have mcx, ixl and bnxt and can put them in that test box with em
interface. Would that be ok ?