Index | Thread | Search

From:
Jan Klemkow <jan@openbsd.org>
Subject:
Re: igc(4): fix recv. jumbo frames
To:
Alexander Bluhm <bluhm@openbsd.org>
Cc:
tech@openbsd.org, Theo de Raadt <deraadt@openbsd.org>
Date:
Thu, 8 Aug 2024 12:51:18 +0200

Download raw body.

Thread
On Wed, Aug 07, 2024 at 11:38:53PM +0200, Alexander Bluhm wrote:
> On Thu, Aug 01, 2024 at 06:48:17PM +0200, Jan Klemkow wrote:
> > On Wed, Jul 31, 2024 at 10:39:25AM -0600, Theo de Raadt wrote:
> > > Alexander Bluhm <bluhm@openbsd.org> wrote:
> > > > On Tue, Jul 30, 2024 at 08:12:08AM +0200, Jan Klemkow wrote:
> > > > > The DMA mapping and allocation of mbufs for jumbo frames uses different
> > > > > sizes.  Thus, we are ending up with corrupt mbufs, which leads to panics
> > > > > in later part of the TCP/IP stack:
> > > > > 
> > > > > panic: kernel diagnostic assertion "M_DATABUF(m) + M_SIZE(m) >= (m->m_data + m->m_len)"
> > > > > failed: file "/usr/src/sys/kern/uipc_mbuf.c", line 1364
> > > > > 
> > > > > With the following diff, we use the same size for mapping and
> > > > > allocation.
> > > > > 
> > > > > ok?
> > > > 
> > > > I have tested this diff successfully.  Before jumbo frames with
> > > > igc(4) paniced, now TCP is faster.
> > > > 
> > > > But are we allocating too much in the non-jumbo case?
> > > >     sc->rx_mbuf_sz = MAX_JUMBO_FRAME_SIZE + ETHER_ALIGN;
> > > >     m = MCLGETL(NULL, M_DONTWAIT, sc->rx_mbuf_sz)
> > > > 
> > > > This means for each receive packet we allocate a 16k mbuf cluster.
> > > > Usually only 1500 byte are needed.
> > > > 
> > > > ix(4) has this code and manages to chain the mbufs correctly.
> > > >     /* Use 2k clusters, even for jumbo frames */
> > > >     sc->rx_mbuf_sz = MCLBYTES + ETHER_ALIGN;
> > > 
> > > This hardware chains fragments in the ring?
> > > 
> > > It's a tradeoff.  jumbos are rare, so why reserve so much memory to them
> > > on hardware which can do better.
> > 
> > I oriented my diff at the FreeBSD code.  Also, the Linux code looks like
> > as they don't use scatter/gather DMA.
> > 
> > But, It works with the following diff below.
> > 
> > The extra size of ETHER_ALIGN seems be wrong.  It also leads to panics
> > caused by corrupted mbufs.  FreeBSD just use MCLBYTES for rx mbufs size.
> > Also NetBSD removed ETHER_ALIGN from their copy of our driver [1].
> > 
> > [1]: https://github.com/NetBSD/src/commit/fb38d839b48b9b6204dbbee1672454d6e719ba01
> > 
> > Index: dev/pci/if_igc.c
> > ===================================================================
> > RCS file: /mount/openbsd/cvs/src/sys/dev/pci/if_igc.c,v
> > diff -u -p -r1.25 if_igc.c
> > --- dev/pci/if_igc.c	24 May 2024 06:02:53 -0000	1.25
> > +++ dev/pci/if_igc.c	1 Aug 2024 15:45:58 -0000
> > @@ -881,7 +881,7 @@ igc_init(void *arg)
> >  	}
> >  	igc_initialize_transmit_unit(sc);
> >  
> > -	sc->rx_mbuf_sz = MCLBYTES + ETHER_ALIGN;
> > +	sc->rx_mbuf_sz = MCLBYTES;
> >  	/* Prepare receive descriptors and buffers. */
> >  	if (igc_setup_receive_structures(sc)) {
> >  		printf("%s: Could not setup receive structures\n",
> > @@ -2159,7 +2159,7 @@ igc_allocate_receive_buffers(struct igc_
> >  	rxbuf = rxr->rx_buffers;
> >  	for (i = 0; i < sc->num_rx_desc; i++, rxbuf++) {
> >  		error = bus_dmamap_create(rxr->rxdma.dma_tag,
> > -		    MAX_JUMBO_FRAME_SIZE, 1, MAX_JUMBO_FRAME_SIZE, 0,
> > +		    MAX_JUMBO_FRAME_SIZE, IGC_MAX_SCATTER, MCLBYTES, 0,
> >  		    BUS_DMA_NOWAIT, &rxbuf->map);
> >  		if (error) {
> >  			printf("%s: Unable to create RX DMA map\n",
> 
> The diff above survives my tests.  But the #define IGC_MAX_SCATTER
> 40 seems like a magic value.  It is nowhere mentioned in the spec.
> With MAX_JUMBO_FRAME_SIZE == 9216 and MCLBYTES == 2048 there can
> be at most 5 segements.  The diff below also passes my tests.
> 
> Is the more specific value IGC_MAX_GATHER == 5 better?

I get your point to change the name from SCATTER to something else, but
GATHER isn't used in any other driver.  In some cases the value is just
hardcoded all other cases use something like MAX_SEGMENTS or MAX_SEGS.
This also better matches with the parameter nsegments of
bus_dmamap_create().

> Or could the hardware try to create larger frames?

bnx(4) uses 8 instead of 5 for busdma mappings for instance.  And many
other driver just map and allocate jumbo size or hardmtu in one segment
for their receive buffers.

Thus I would suggest to increase it a bit.  Better safe the sorry, till
we know a accurate or official value for this limit.

ok?

bye,
Jan

Index: dev/pci/if_igc.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.c,v
diff -u -p -r1.25 if_igc.c
--- dev/pci/if_igc.c	24 May 2024 06:02:53 -0000	1.25
+++ dev/pci/if_igc.c	6 Aug 2024 08:36:31 -0000
@@ -881,7 +881,7 @@ igc_init(void *arg)
 	}
 	igc_initialize_transmit_unit(sc);
 
-	sc->rx_mbuf_sz = MCLBYTES + ETHER_ALIGN;
+	sc->rx_mbuf_sz = MCLBYTES;
 	/* Prepare receive descriptors and buffers. */
 	if (igc_setup_receive_structures(sc)) {
 		printf("%s: Could not setup receive structures\n",
@@ -2159,7 +2159,7 @@ igc_allocate_receive_buffers(struct igc_
 	rxbuf = rxr->rx_buffers;
 	for (i = 0; i < sc->num_rx_desc; i++, rxbuf++) {
 		error = bus_dmamap_create(rxr->rxdma.dma_tag,
-		    MAX_JUMBO_FRAME_SIZE, 1, MAX_JUMBO_FRAME_SIZE, 0,
+		    MAX_JUMBO_FRAME_SIZE, IGC_MAX_SEGS, MCLBYTES, 0,
 		    BUS_DMA_NOWAIT, &rxbuf->map);
 		if (error) {
 			printf("%s: Unable to create RX DMA map\n",
Index: dev/pci/if_igc.h
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.h,v
diff -u -p -r1.4 if_igc.h
--- dev/pci/if_igc.h	21 May 2024 11:19:39 -0000	1.4
+++ dev/pci/if_igc.h	6 Aug 2024 08:34:42 -0000
@@ -191,6 +191,7 @@
 #define roundup2(size, unit)	(((size) + (unit) - 1) & ~((unit) - 1))
 #define msec_delay(x)		DELAY(1000 * (x))
 
+#define IGC_MAX_SEGS		8
 #define IGC_MAX_SCATTER		40
 #define IGC_TSO_SIZE		65535