Index | Thread | Search

From:
Alexander Bluhm <bluhm@openbsd.org>
Subject:
Re: igc(4): fix recv. jumbo frames
To:
Jan Klemkow <jan@openbsd.org>
Cc:
tech@openbsd.org, Theo de Raadt <deraadt@openbsd.org>
Date:
Wed, 7 Aug 2024 23:38:53 +0200

Download raw body.

Thread
On Thu, Aug 01, 2024 at 06:48:17PM +0200, Jan Klemkow wrote:
> On Wed, Jul 31, 2024 at 10:39:25AM -0600, Theo de Raadt wrote:
> > Alexander Bluhm <bluhm@openbsd.org> wrote:
> > 
> > > On Tue, Jul 30, 2024 at 08:12:08AM +0200, Jan Klemkow wrote:
> > > > The DMA mapping and allocation of mbufs for jumbo frames uses different
> > > > sizes.  Thus, we are ending up with corrupt mbufs, which leads to panics
> > > > in later part of the TCP/IP stack:
> > > > 
> > > > panic: kernel diagnostic assertion "M_DATABUF(m) + M_SIZE(m) >= (m->m_data + m->m_len)"
> > > > failed: file "/usr/src/sys/kern/uipc_mbuf.c", line 1364
> > > > 
> > > > With the following diff, we use the same size for mapping and
> > > > allocation.
> > > > 
> > > > ok?
> > > 
> > > I have tested this diff successfully.  Before jumbo frames with
> > > igc(4) paniced, now TCP is faster.
> > > 
> > > But are we allocating too much in the non-jumbo case?
> > >     sc->rx_mbuf_sz = MAX_JUMBO_FRAME_SIZE + ETHER_ALIGN;
> > >     m = MCLGETL(NULL, M_DONTWAIT, sc->rx_mbuf_sz)
> > > 
> > > This means for each receive packet we allocate a 16k mbuf cluster.
> > > Usually only 1500 byte are needed.
> > > 
> > > ix(4) has this code and manages to chain the mbufs correctly.
> > >     /* Use 2k clusters, even for jumbo frames */
> > >     sc->rx_mbuf_sz = MCLBYTES + ETHER_ALIGN;
> > 
> > This hardware chains fragments in the ring?
> > 
> > It's a tradeoff.  jumbos are rare, so why reserve so much memory to them
> > on hardware which can do better.
> 
> I oriented my diff at the FreeBSD code.  Also, the Linux code looks like
> as they don't use scatter/gather DMA.
> 
> But, It works with the following diff below.
> 
> The extra size of ETHER_ALIGN seems be wrong.  It also leads to panics
> caused by corrupted mbufs.  FreeBSD just use MCLBYTES for rx mbufs size.
> Also NetBSD removed ETHER_ALIGN from their copy of our driver [1].
> 
> [1]: https://github.com/NetBSD/src/commit/fb38d839b48b9b6204dbbee1672454d6e719ba01
> 
> ok?
> 
> bye,
> Jan
> 
> Index: dev/pci/if_igc.c
> ===================================================================
> RCS file: /mount/openbsd/cvs/src/sys/dev/pci/if_igc.c,v
> diff -u -p -r1.25 if_igc.c
> --- dev/pci/if_igc.c	24 May 2024 06:02:53 -0000	1.25
> +++ dev/pci/if_igc.c	1 Aug 2024 15:45:58 -0000
> @@ -881,7 +881,7 @@ igc_init(void *arg)
>  	}
>  	igc_initialize_transmit_unit(sc);
>  
> -	sc->rx_mbuf_sz = MCLBYTES + ETHER_ALIGN;
> +	sc->rx_mbuf_sz = MCLBYTES;
>  	/* Prepare receive descriptors and buffers. */
>  	if (igc_setup_receive_structures(sc)) {
>  		printf("%s: Could not setup receive structures\n",
> @@ -2159,7 +2159,7 @@ igc_allocate_receive_buffers(struct igc_
>  	rxbuf = rxr->rx_buffers;
>  	for (i = 0; i < sc->num_rx_desc; i++, rxbuf++) {
>  		error = bus_dmamap_create(rxr->rxdma.dma_tag,
> -		    MAX_JUMBO_FRAME_SIZE, 1, MAX_JUMBO_FRAME_SIZE, 0,
> +		    MAX_JUMBO_FRAME_SIZE, IGC_MAX_SCATTER, MCLBYTES, 0,
>  		    BUS_DMA_NOWAIT, &rxbuf->map);
>  		if (error) {
>  			printf("%s: Unable to create RX DMA map\n",

The diff above survives my tests.  But the #define IGC_MAX_SCATTER
40 seems like a magic value.  It is nowhere mentioned in the spec.
With MAX_JUMBO_FRAME_SIZE == 9216 and MCLBYTES == 2048 there can
be at most 5 segements.  The diff below also passes my tests.

Is the more specific value IGC_MAX_GATHER == 5 better?
Or could the hardware try to create larger frames?

bluhm

Index: dev/pci/if_igc.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.c,v
diff -u -p -r1.25 if_igc.c
--- dev/pci/if_igc.c	24 May 2024 06:02:53 -0000	1.25
+++ dev/pci/if_igc.c	6 Aug 2024 08:36:31 -0000
@@ -881,7 +881,7 @@ igc_init(void *arg)
 	}
 	igc_initialize_transmit_unit(sc);
 
-	sc->rx_mbuf_sz = MCLBYTES + ETHER_ALIGN;
+	sc->rx_mbuf_sz = MCLBYTES;
 	/* Prepare receive descriptors and buffers. */
 	if (igc_setup_receive_structures(sc)) {
 		printf("%s: Could not setup receive structures\n",
@@ -2159,7 +2159,7 @@ igc_allocate_receive_buffers(struct igc_
 	rxbuf = rxr->rx_buffers;
 	for (i = 0; i < sc->num_rx_desc; i++, rxbuf++) {
 		error = bus_dmamap_create(rxr->rxdma.dma_tag,
-		    MAX_JUMBO_FRAME_SIZE, 1, MAX_JUMBO_FRAME_SIZE, 0,
+		    MAX_JUMBO_FRAME_SIZE, IGC_MAX_GATHER, MCLBYTES, 0,
 		    BUS_DMA_NOWAIT, &rxbuf->map);
 		if (error) {
 			printf("%s: Unable to create RX DMA map\n",
Index: dev/pci/if_igc.h
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.h,v
diff -u -p -r1.4 if_igc.h
--- dev/pci/if_igc.h	21 May 2024 11:19:39 -0000	1.4
+++ dev/pci/if_igc.h	6 Aug 2024 08:34:42 -0000
@@ -191,6 +191,7 @@
 #define roundup2(size, unit)	(((size) + (unit) - 1) & ~((unit) - 1))
 #define msec_delay(x)		DELAY(1000 * (x))
 
+#define IGC_MAX_GATHER		5
 #define IGC_MAX_SCATTER		40
 #define IGC_TSO_SIZE		65535