Index | Thread | Search

From:
Marcus Glocker <marcus@nazgul.ch>
Subject:
ix(4) LRO/TSO fix
To:
tech@openbsd.org
Date:
Thu, 15 Feb 2024 07:55:06 +0100

Download raw body.

Thread
Hi,

When we did commit the em(4) TSO diff, we got an issues reported by
Hrvoje, where a combination of ix(4), em(4), and vlan(4) was creating
watchdog timeouts on em(4).

In the ix(4) RX/LRO code path, the driver makes a decision whether to
TSO tag the received packet for forwarding.  For that, the header length
gets calculated to find out about the packet payload.  Two things go
wrong in this calculation:

1. The VLAN header length is included.  But the NIC cuts off the VLAN
   header already.

2. Possible padding (as seen for example on very small packets) is not
   considered, and counted as data payload.

What happened after the em(4) TSO diff commit, is that ix(4) was
receiving ACK packets with 0 bytes payload, but did calculate 2 bytes
payload for it.  Hence, the packet was TSO tagged and forwarded.  Our
TCP stack did re-calculate 0 bytes payload, and forwarded the packet to
em(4).  There it reached the new TSO code path which was requested to
offload a packet with 0 bytes payload, which makes the TX fail, and
creates an watchdog timeout.

The following ix(4) diff is fixing the calculation as following:

1. Don't include the VLAN header length.

2. Do calculate the real packet data payload by using the total IP
   length field.  With the last ether_extract_headers() commit, the
   IP length field can now be easily accessed.

On the Hrvoje setup, no more em(4) watchdog timeouts can be seen with
this diff.

OK?


Index: if_ix.c
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_ix.c,v
diff -u -p -r1.208 if_ix.c
--- if_ix.c	14 Feb 2024 22:41:48 -0000	1.208
+++ if_ix.c	15 Feb 2024 05:47:22 -0000
@@ -3260,20 +3260,24 @@ ixgbe_rxeof(struct rx_ring *rxr)
 
 			if (pkts > 1) {
 				struct ether_extracted ext;
-				uint32_t hdrlen, paylen;
+				uint32_t paylen;
 
-				/* Calculate header size. */
+				/*
+				 * Calculate the payload size:
+				 *
+				 * The packet length returned by the NIC
+				 * (sendmp->m_pkthdr.len) can contain
+				 * padding, which we don't want to count
+				 * in to the payload size.  Therefore, we
+				 * calculate the real payload size based
+				 * on the total ip length field (ext.iplen).
+				 */
 				ether_extract_headers(sendmp, &ext);
-				hdrlen = sizeof(*ext.eh);
-#if NVLAN > 0
-				if (ISSET(sendmp->m_flags, M_VLANTAG) ||
-				    ext.evh)
-					hdrlen += ETHER_VLAN_ENCAP_LEN;
-#endif
+				paylen = ext.iplen;
 				if (ext.ip4 || ext.ip6)
-					hdrlen += ext.iphlen;
+					paylen -= ext.iphlen;
 				if (ext.tcp) {
-					hdrlen += ext.tcphlen;
+					paylen -= ext.tcphlen;
 					tcpstat_inc(tcps_inhwlro);
 					tcpstat_add(tcps_inpktlro, pkts);
 				} else {
@@ -3285,8 +3289,6 @@ ixgbe_rxeof(struct rx_ring *rxr)
 				 * mark it as TSO, set a correct mss,
 				 * and recalculate the TCP checksum.
 				 */
-				paylen = sendmp->m_pkthdr.len > hdrlen ?
-				    sendmp->m_pkthdr.len - hdrlen : 0;
 				if (ext.tcp && paylen >= pkts) {
 					SET(sendmp->m_pkthdr.csum_flags,
 					    M_TCP_TSO);