From: Alexander Bluhm Subject: Re: igc(4): fix recv. jumbo frames To: Jan Klemkow Cc: tech@openbsd.org, Theo de Raadt Date: Wed, 7 Aug 2024 23:38:53 +0200 On Thu, Aug 01, 2024 at 06:48:17PM +0200, Jan Klemkow wrote: > On Wed, Jul 31, 2024 at 10:39:25AM -0600, Theo de Raadt wrote: > > Alexander Bluhm wrote: > > > > > On Tue, Jul 30, 2024 at 08:12:08AM +0200, Jan Klemkow wrote: > > > > The DMA mapping and allocation of mbufs for jumbo frames uses different > > > > sizes. Thus, we are ending up with corrupt mbufs, which leads to panics > > > > in later part of the TCP/IP stack: > > > > > > > > panic: kernel diagnostic assertion "M_DATABUF(m) + M_SIZE(m) >= (m->m_data + m->m_len)" > > > > failed: file "/usr/src/sys/kern/uipc_mbuf.c", line 1364 > > > > > > > > With the following diff, we use the same size for mapping and > > > > allocation. > > > > > > > > ok? > > > > > > I have tested this diff successfully. Before jumbo frames with > > > igc(4) paniced, now TCP is faster. > > > > > > But are we allocating too much in the non-jumbo case? > > > sc->rx_mbuf_sz = MAX_JUMBO_FRAME_SIZE + ETHER_ALIGN; > > > m = MCLGETL(NULL, M_DONTWAIT, sc->rx_mbuf_sz) > > > > > > This means for each receive packet we allocate a 16k mbuf cluster. > > > Usually only 1500 byte are needed. > > > > > > ix(4) has this code and manages to chain the mbufs correctly. > > > /* Use 2k clusters, even for jumbo frames */ > > > sc->rx_mbuf_sz = MCLBYTES + ETHER_ALIGN; > > > > This hardware chains fragments in the ring? > > > > It's a tradeoff. jumbos are rare, so why reserve so much memory to them > > on hardware which can do better. > > I oriented my diff at the FreeBSD code. Also, the Linux code looks like > as they don't use scatter/gather DMA. > > But, It works with the following diff below. > > The extra size of ETHER_ALIGN seems be wrong. It also leads to panics > caused by corrupted mbufs. FreeBSD just use MCLBYTES for rx mbufs size. > Also NetBSD removed ETHER_ALIGN from their copy of our driver [1]. > > [1]: https://github.com/NetBSD/src/commit/fb38d839b48b9b6204dbbee1672454d6e719ba01 > > ok? > > bye, > Jan > > Index: dev/pci/if_igc.c > =================================================================== > RCS file: /mount/openbsd/cvs/src/sys/dev/pci/if_igc.c,v > diff -u -p -r1.25 if_igc.c > --- dev/pci/if_igc.c 24 May 2024 06:02:53 -0000 1.25 > +++ dev/pci/if_igc.c 1 Aug 2024 15:45:58 -0000 > @@ -881,7 +881,7 @@ igc_init(void *arg) > } > igc_initialize_transmit_unit(sc); > > - sc->rx_mbuf_sz = MCLBYTES + ETHER_ALIGN; > + sc->rx_mbuf_sz = MCLBYTES; > /* Prepare receive descriptors and buffers. */ > if (igc_setup_receive_structures(sc)) { > printf("%s: Could not setup receive structures\n", > @@ -2159,7 +2159,7 @@ igc_allocate_receive_buffers(struct igc_ > rxbuf = rxr->rx_buffers; > for (i = 0; i < sc->num_rx_desc; i++, rxbuf++) { > error = bus_dmamap_create(rxr->rxdma.dma_tag, > - MAX_JUMBO_FRAME_SIZE, 1, MAX_JUMBO_FRAME_SIZE, 0, > + MAX_JUMBO_FRAME_SIZE, IGC_MAX_SCATTER, MCLBYTES, 0, > BUS_DMA_NOWAIT, &rxbuf->map); > if (error) { > printf("%s: Unable to create RX DMA map\n", The diff above survives my tests. But the #define IGC_MAX_SCATTER 40 seems like a magic value. It is nowhere mentioned in the spec. With MAX_JUMBO_FRAME_SIZE == 9216 and MCLBYTES == 2048 there can be at most 5 segements. The diff below also passes my tests. Is the more specific value IGC_MAX_GATHER == 5 better? Or could the hardware try to create larger frames? bluhm Index: dev/pci/if_igc.c =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.c,v diff -u -p -r1.25 if_igc.c --- dev/pci/if_igc.c 24 May 2024 06:02:53 -0000 1.25 +++ dev/pci/if_igc.c 6 Aug 2024 08:36:31 -0000 @@ -881,7 +881,7 @@ igc_init(void *arg) } igc_initialize_transmit_unit(sc); - sc->rx_mbuf_sz = MCLBYTES + ETHER_ALIGN; + sc->rx_mbuf_sz = MCLBYTES; /* Prepare receive descriptors and buffers. */ if (igc_setup_receive_structures(sc)) { printf("%s: Could not setup receive structures\n", @@ -2159,7 +2159,7 @@ igc_allocate_receive_buffers(struct igc_ rxbuf = rxr->rx_buffers; for (i = 0; i < sc->num_rx_desc; i++, rxbuf++) { error = bus_dmamap_create(rxr->rxdma.dma_tag, - MAX_JUMBO_FRAME_SIZE, 1, MAX_JUMBO_FRAME_SIZE, 0, + MAX_JUMBO_FRAME_SIZE, IGC_MAX_GATHER, MCLBYTES, 0, BUS_DMA_NOWAIT, &rxbuf->map); if (error) { printf("%s: Unable to create RX DMA map\n", Index: dev/pci/if_igc.h =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.h,v diff -u -p -r1.4 if_igc.h --- dev/pci/if_igc.h 21 May 2024 11:19:39 -0000 1.4 +++ dev/pci/if_igc.h 6 Aug 2024 08:34:42 -0000 @@ -191,6 +191,7 @@ #define roundup2(size, unit) (((size) + (unit) - 1) & ~((unit) - 1)) #define msec_delay(x) DELAY(1000 * (x)) +#define IGC_MAX_GATHER 5 #define IGC_MAX_SCATTER 40 #define IGC_TSO_SIZE 65535