Download raw body.
ix(4) LRO/TSO fix
On Thu, Feb 15, 2024 at 07:55:06AM +0100, Marcus Glocker wrote:
> Hi,
>
> When we did commit the em(4) TSO diff, we got an issues reported by
> Hrvoje, where a combination of ix(4), em(4), and vlan(4) was creating
> watchdog timeouts on em(4).
>
> In the ix(4) RX/LRO code path, the driver makes a decision whether to
> TSO tag the received packet for forwarding. For that, the header length
> gets calculated to find out about the packet payload. Two things go
> wrong in this calculation:
>
> 1. The VLAN header length is included. But the NIC cuts off the VLAN
> header already.
>
> 2. Possible padding (as seen for example on very small packets) is not
> considered, and counted as data payload.
>
> What happened after the em(4) TSO diff commit, is that ix(4) was
> receiving ACK packets with 0 bytes payload, but did calculate 2 bytes
> payload for it. Hence, the packet was TSO tagged and forwarded. Our
> TCP stack did re-calculate 0 bytes payload, and forwarded the packet to
> em(4). There it reached the new TSO code path which was requested to
> offload a packet with 0 bytes payload, which makes the TX fail, and
> creates an watchdog timeout.
>
> The following ix(4) diff is fixing the calculation as following:
>
> 1. Don't include the VLAN header length.
>
> 2. Do calculate the real packet data payload by using the total IP
> length field. With the last ether_extract_headers() commit, the
> IP length field can now be easily accessed.
>
> On the Hrvoje setup, no more em(4) watchdog timeouts can be seen with
> this diff.
>
> OK?
Accessing the packet data inside the rxeof function is not ideal.
Mainly we fault in DMA memory into the CPU cache and then do the same
again later on the CPU handling the data.
In an ideal world the driver would only touch the metadata but never
access the DMA memory so that there is no cache pollution. As usual
reality is far from an ideal world.
I wonder if other LRO capable devices will need similar changes and if in
that case this should be moved to the generic input processing.
Diff looks reasonable, so OK claudio@.
> Index: if_ix.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/if_ix.c,v
> diff -u -p -r1.208 if_ix.c
> --- if_ix.c 14 Feb 2024 22:41:48 -0000 1.208
> +++ if_ix.c 15 Feb 2024 05:47:22 -0000
> @@ -3260,20 +3260,24 @@ ixgbe_rxeof(struct rx_ring *rxr)
>
> if (pkts > 1) {
> struct ether_extracted ext;
> - uint32_t hdrlen, paylen;
> + uint32_t paylen;
>
> - /* Calculate header size. */
> + /*
> + * Calculate the payload size:
> + *
> + * The packet length returned by the NIC
> + * (sendmp->m_pkthdr.len) can contain
> + * padding, which we don't want to count
> + * in to the payload size. Therefore, we
> + * calculate the real payload size based
> + * on the total ip length field (ext.iplen).
> + */
> ether_extract_headers(sendmp, &ext);
> - hdrlen = sizeof(*ext.eh);
> -#if NVLAN > 0
> - if (ISSET(sendmp->m_flags, M_VLANTAG) ||
> - ext.evh)
> - hdrlen += ETHER_VLAN_ENCAP_LEN;
> -#endif
> + paylen = ext.iplen;
> if (ext.ip4 || ext.ip6)
> - hdrlen += ext.iphlen;
> + paylen -= ext.iphlen;
> if (ext.tcp) {
> - hdrlen += ext.tcphlen;
> + paylen -= ext.tcphlen;
> tcpstat_inc(tcps_inhwlro);
> tcpstat_add(tcps_inpktlro, pkts);
> } else {
> @@ -3285,8 +3289,6 @@ ixgbe_rxeof(struct rx_ring *rxr)
> * mark it as TSO, set a correct mss,
> * and recalculate the TCP checksum.
> */
> - paylen = sendmp->m_pkthdr.len > hdrlen ?
> - sendmp->m_pkthdr.len - hdrlen : 0;
> if (ext.tcp && paylen >= pkts) {
> SET(sendmp->m_pkthdr.csum_flags,
> M_TCP_TSO);
>
--
:wq Claudio
ix(4) LRO/TSO fix