Download raw body.
TSO with small packets
On Fri, Feb 09, 2024 at 01:50:53PM +0100, Claudio Jeker wrote:
> On Fri, Feb 09, 2024 at 01:37:45PM +0100, Alexander Bluhm wrote:
> > On Fri, Feb 09, 2024 at 12:11:52PM +0100, Claudio Jeker wrote:
> > > On Fri, Feb 09, 2024 at 11:37:32AM +0100, Alexander Bluhm wrote:
> > > > Hi,
> > > >
> > > > After a lot of debugging and testing with Hrvoje and mglocker@, we
> > > > have found the cause for occasional watchdog timeouts with em TSO
> > > > diff.
> > > >
> > > > In this setup TCP packets are routed from ix(4) with LRO to em(4)
> > > > with TSO. It happens that ix hardware coalesces packets that are
> > > > in total smaller than MTU. In real live this is rare as either you
> > > > have large TCP packets in a row, or small TCP packets from time to
> > > > time.
> > > >
> > > > Small packets that went though LRO have the TSO bit set. But TSO
> > > > is not necessary as only few small TCP packets have been reassembled.
> > > > This confuses em hardware. So clear TSO flag during interface
> > > > output path in that case.
> > > >
> > > > This is only relevant when forwarding to hardware TSO. Our stack
> > > > does not generate such packets and can handle them. Diff below
> > > > prevents them to reach hardware TSO.
> > > >
> > > > ok?
> > >
> > > But shouldn't the system send out those small packets split up again?
> > > At least I think the idea is that the effect of LRO is reversed while
> > > sending out so that the same amount of packets are sent as without LRO/TSO.
> >
> > The small packets in our case have 2 bytes payload after LRO. We
> > instruct em to split them in 1 byte packets and that fails with
> > watchdog timeouts. It works if I clear the TSO bit.
> >
> > But you are right, my diff may break path MTU discovery. So
> > m_pkthdr.len <= mtu is not ideal. Is there a minmum TSO size
> > hardware can handle? We might use that instead of MTU.
>
> Probably the minumum IPv6 packet size (576 IIRC) would be a good minimum.
RFC 791: All hosts must be prepared to accept datagrams of up to
576 octets (whether they arrive whole or in fragments).
RFC 791: Every internet module must be able to forward a datagram
of 68 octets without further fragmentation.
RFC 1280: IPv6 requires that every link in the internet have an MTU
of 1280 octets or greater.
I think 576 is a sane choice. We don't care about fragments when
doing path MTU. The world is not IPv6.
Hopefully all hardware can handle TSO to split packets of this size.
We see the problems in em with around 60 byte size, depending on
TCP header length.
The diff should fix the packets that Mglocker has seen in Hrvoje's
setup. For packets between 600 and 1400 bytes we don't know what
hardware will do, but it cannot be worse than the current code.
ok?
bluhm
Index: netinet/tcp_output.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_output.c,v
diff -u -p -r1.141 tcp_output.c
--- netinet/tcp_output.c 26 Nov 2023 22:08:10 -0000 1.141
+++ netinet/tcp_output.c 9 Feb 2024 13:11:58 -0000
@@ -1360,6 +1360,12 @@ tcp_if_output_tso(struct ifnet *ifp, str
/* caller must fail later or fragment */
if (!ISSET((*mp)->m_pkthdr.csum_flags, M_TCP_TSO))
return 0;
+ /* send without hardware TSO if within minimum IPv4 packet size */
+ if ((*mp)->m_pkthdr.len <= 576) {
+ CLR((*mp)->m_pkthdr.csum_flags, M_TCP_TSO);
+ return 0;
+ }
+ /* fragment or fail if interface cannot handle size after chopping */
if ((*mp)->m_pkthdr.ph_mss > mtu) {
CLR((*mp)->m_pkthdr.csum_flags, M_TCP_TSO);
return 0;
TSO with small packets