From: Marcus Glocker Subject: ix(4) LRO/TSO fix To: tech@openbsd.org Date: Thu, 15 Feb 2024 07:55:06 +0100 Hi, When we did commit the em(4) TSO diff, we got an issues reported by Hrvoje, where a combination of ix(4), em(4), and vlan(4) was creating watchdog timeouts on em(4). In the ix(4) RX/LRO code path, the driver makes a decision whether to TSO tag the received packet for forwarding. For that, the header length gets calculated to find out about the packet payload. Two things go wrong in this calculation: 1. The VLAN header length is included. But the NIC cuts off the VLAN header already. 2. Possible padding (as seen for example on very small packets) is not considered, and counted as data payload. What happened after the em(4) TSO diff commit, is that ix(4) was receiving ACK packets with 0 bytes payload, but did calculate 2 bytes payload for it. Hence, the packet was TSO tagged and forwarded. Our TCP stack did re-calculate 0 bytes payload, and forwarded the packet to em(4). There it reached the new TSO code path which was requested to offload a packet with 0 bytes payload, which makes the TX fail, and creates an watchdog timeout. The following ix(4) diff is fixing the calculation as following: 1. Don't include the VLAN header length. 2. Do calculate the real packet data payload by using the total IP length field. With the last ether_extract_headers() commit, the IP length field can now be easily accessed. On the Hrvoje setup, no more em(4) watchdog timeouts can be seen with this diff. OK? Index: if_ix.c =================================================================== RCS file: /cvs/src/sys/dev/pci/if_ix.c,v diff -u -p -r1.208 if_ix.c --- if_ix.c 14 Feb 2024 22:41:48 -0000 1.208 +++ if_ix.c 15 Feb 2024 05:47:22 -0000 @@ -3260,20 +3260,24 @@ ixgbe_rxeof(struct rx_ring *rxr) if (pkts > 1) { struct ether_extracted ext; - uint32_t hdrlen, paylen; + uint32_t paylen; - /* Calculate header size. */ + /* + * Calculate the payload size: + * + * The packet length returned by the NIC + * (sendmp->m_pkthdr.len) can contain + * padding, which we don't want to count + * in to the payload size. Therefore, we + * calculate the real payload size based + * on the total ip length field (ext.iplen). + */ ether_extract_headers(sendmp, &ext); - hdrlen = sizeof(*ext.eh); -#if NVLAN > 0 - if (ISSET(sendmp->m_flags, M_VLANTAG) || - ext.evh) - hdrlen += ETHER_VLAN_ENCAP_LEN; -#endif + paylen = ext.iplen; if (ext.ip4 || ext.ip6) - hdrlen += ext.iphlen; + paylen -= ext.iphlen; if (ext.tcp) { - hdrlen += ext.tcphlen; + paylen -= ext.tcphlen; tcpstat_inc(tcps_inhwlro); tcpstat_add(tcps_inpktlro, pkts); } else { @@ -3285,8 +3289,6 @@ ixgbe_rxeof(struct rx_ring *rxr) * mark it as TSO, set a correct mss, * and recalculate the TCP checksum. */ - paylen = sendmp->m_pkthdr.len > hdrlen ? - sendmp->m_pkthdr.len - hdrlen : 0; if (ext.tcp && paylen >= pkts) { SET(sendmp->m_pkthdr.csum_flags, M_TCP_TSO);