Download raw body.
bse: mp-safe genet_qstart()
On Wed, Jan 14, 2026 at 11:01:49PM +0000, Vitaliy Makkoveev wrote:
> On Tue, Jan 13, 2026 at 09:20:24AM +0000, Vitaliy Makkoveev wrote:
> > On Mon, Jan 12, 2026 at 11:40:47AM +1000, Jonathan Matthew wrote:
> > > On Sun, Jan 11, 2026 at 01:18:47PM +0300, Vitaliy Makkoveev wrote:
> > > > On Tue, Jan 06, 2026 at 12:21:28PM +0300, Vitaliy Makkoveev wrote:
> > > > > On Tue, Jan 06, 2026 at 01:04:11AM +0300, Vitaliy Makkoveev wrote:
> > > > > > This diff follows the instructions from net/ifq.h The interface queues
> > > > > > layer takes care about serialization. To keep qenet_qstart() path
> > > > > > lockless I do atomic operations on sc_tx.queued shared between
> > > > > > genet_qstart() and genet_intr() to avoid mutex(9). It also rely on
> > > > > > atomic operations, but exclude parallel run of these handlers.
> > > > > >
> > > > > > I do re-lock dances around barriers within genet_stop(). In one hand
> > > > > > they are not necessary because the netlock will not be taken in both
> > > > > > genet_qstart() and genet_intr() paths. In other hand, barriers under
> > > > > > exclusive netlock completely stop packets processing. Some mp-safe
> > > > > > drivers follow the first way, some follow the second.
> > > > > >
> > > > > > Tested for two days on couple RPI4 under load and continuous interface
> > > > > > up/down sequence.
> > > > > >
> > > > >
> > > > > Al little update. Against previous I pushed sc_tx.queued decrement back
> > > > > to the loop. Previous one looks too fragile to me. If we want to
> > > > > optimise this path, is better to do subtraction after loop.
> > > > >
> > > >
> > > > bluhm@ doesn't like re-locking in the ioctl path. So this diff follows
> > > > the ixgbe(4) way and avoids re-locking. This is not the problem, because
> > > > concurrent genet_qstart() thread never acquire netlock.
> > > >
> > > > Diff is stable more than 36 hours.
> > > >
> > > > ok?
> > >
> > > In most mpsafe drivers we avoid using atomic operations by
> > > calculating the number of free tx slots from the producer and
> > > consumer indexes, where the producer is only updated by qstart
> > > and the consumer is only updated by the interrupt handler, rather
> > > than having a field storing the difference between them that has
> > > to be updated in both. It looks like that approach should work here
> > > too, using the cidx and pidx fields in sc->sc_tx.
> > >
> >
> > Thanks for feedback. The diff was reworked to follow this way.
> >
>
> No issues for more than two days.
I don't have hardware to test this, but it looks good to me.
One suggestion for a future change below, but ok jmatthew@ as is.
>
> > Index: sys/dev/ic/bcmgenet.c
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/ic/bcmgenet.c,v
> > retrieving revision 1.10
> > diff -u -p -r1.10 bcmgenet.c
> > --- sys/dev/ic/bcmgenet.c 5 Jan 2026 11:49:01 -0000 1.10
> > +++ sys/dev/ic/bcmgenet.c 13 Jan 2026 09:12:25 -0000
> > @@ -195,7 +195,6 @@ genet_setup_txdesc(struct genet_softc *s
> > uint32_t status;
> >
> > status = flags | __SHIFTIN(len, GENET_TX_DESC_STATUS_BUFLEN);
> > - ++sc->sc_tx.queued;
> >
> > WR4(sc, GENET_TX_DESC_ADDRESS_LO(index), (uint32_t)paddr);
> > WR4(sc, GENET_TX_DESC_ADDRESS_HI(index), (uint32_t)(paddr >> 32));
> > @@ -206,7 +205,7 @@ int
> > genet_setup_txbuf(struct genet_softc *sc, int index, struct mbuf *m)
> > {
> > bus_dma_segment_t *segs;
> > - int error, nsegs, cur, i;
> > + int error, queued, nsegs, cur, i;
> > uint32_t flags;
> >
> > /*
> > @@ -233,8 +232,9 @@ genet_setup_txbuf(struct genet_softc *sc
> >
> > segs = sc->sc_tx.buf_map[index].map->dm_segs;
> > nsegs = sc->sc_tx.buf_map[index].map->dm_nsegs;
> > + queued = (sc->sc_tx.pidx - sc->sc_tx.cidx) & 0xffff;
> >
> > - if (sc->sc_tx.queued >= GENET_DMA_DESC_COUNT - nsegs) {
> > + if (queued >= GENET_DMA_DESC_COUNT - nsegs) {
Instead of checking if there's space on the ring here, after loading
the mbuf into a DMA map, what we typically do is check if there's
space for the largest possible packet in *_start before dequeuing,
so we don't have to then roll back if the packet we got won't fit.
The big comment towards the top of src/sys/net/ifq.h shows what
this should look like.
In this driver, the largest possible packet is apparently 128
segments, and the ring only has 256 slots, so this would leave half
the ring unused. 128 segments is 'a lot' for a gigabit interface
that only supports 1500 byte packets, so maybe reducing this to 8 or
so would be better? I think we only get up to 30 segments on 10GbE+
nics with TSO, so we won't get anywhere near that here.
> > bus_dmamap_unload(sc->sc_tx.buf_tag,
> > sc->sc_tx.buf_map[index].map);
> > return -1;
> > @@ -520,7 +520,6 @@ genet_init_rings(struct genet_softc *sc,
> > /* TX ring */
> >
> > sc->sc_tx.next = 0;
> > - sc->sc_tx.queued = 0;
> > sc->sc_tx.cidx = sc->sc_tx.pidx = 0;
> >
> > WR4(sc, GENET_TX_SCB_BURST_SIZE, 0x08);
> > @@ -657,6 +656,7 @@ genet_stop(struct genet_softc *sc)
> > ifp->if_flags &= ~IFF_RUNNING;
> > ifq_clr_oactive(&ifp->if_snd);
> >
> > + ifq_barrier(&ifp->if_snd);
> > intr_barrier(sc->sc_ih);
> >
> > /* Clean RX ring. */
> > @@ -749,17 +749,15 @@ genet_txintr(struct genet_softc *sc, int
> > {
> > struct ifnet *ifp = &sc->sc_ac.ac_if;
> > struct genet_bufmap *bmap;
> > - uint32_t cidx, total;
> > - int i;
> > + uint32_t queued;
> >
> > - cidx = RD4(sc, GENET_TX_DMA_CONS_INDEX(qid)) & 0xffff;
> > - total = (cidx - sc->sc_tx.cidx) & 0xffff;
> > + queued = (RD4(sc, GENET_TX_DMA_PROD_INDEX(qid)) -
> > + sc->sc_tx.cidx) & 0xffff;
> >
> > - for (i = sc->sc_tx.next; sc->sc_tx.queued > 0 && total > 0;
> > - i = TX_NEXT(i), total--) {
> > + while (queued > 0) {
> > /* XXX check for errors */
> >
> > - bmap = &sc->sc_tx.buf_map[i];
> > + bmap = &sc->sc_tx.buf_map[sc->sc_tx.next];
> > if (bmap->mbuf != NULL) {
> > bus_dmamap_sync(sc->sc_tx.buf_tag, bmap->map,
> > 0, bmap->map->dm_mapsize,
> > @@ -769,21 +767,20 @@ genet_txintr(struct genet_softc *sc, int
> > bmap->mbuf = NULL;
> > }
> >
> > - --sc->sc_tx.queued;
> > - }
> > -
> > - if (sc->sc_tx.cidx != cidx) {
> > - sc->sc_tx.next = i;
> > - sc->sc_tx.cidx = cidx;
> > + queued--;
> >
> > - if (ifq_is_oactive(&ifp->if_snd))
> > - ifq_restart(&ifp->if_snd);
> > + sc->sc_tx.cidx++;
> > + sc->sc_tx.next = TX_NEXT(sc->sc_tx.next);
> > }
> > +
> > + if (ifq_is_oactive(&ifp->if_snd))
> > + ifq_restart(&ifp->if_snd);
> > }
> >
> > void
> > -genet_start(struct ifnet *ifp)
> > +genet_qstart(struct ifqueue *ifq)
> > {
> > + struct ifnet *ifp = ifq->ifq_if;
> > struct genet_softc *sc = ifp->if_softc;
> > struct mbuf *m;
> > const int qid = GENET_DMA_DEFAULT_QUEUE;
> > @@ -971,7 +968,8 @@ genet_attach(struct genet_softc *sc)
> > ifp->if_softc = sc;
> > snprintf(ifp->if_xname, IFNAMSIZ, "%s", sc->sc_dev.dv_xname);
> > ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
> > - ifp->if_start = genet_start;
> > + ifp->if_xflags = IFXF_MPSAFE;
> > + ifp->if_qstart = genet_qstart;
> > ifp->if_ioctl = genet_ioctl;
> > ifq_init_maxlen(&ifp->if_snd, IFQ_MAXLEN);
> >
> > Index: sys/dev/ic/bcmgenetvar.h
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/ic/bcmgenetvar.h,v
> > retrieving revision 1.1
> > diff -u -p -r1.1 bcmgenetvar.h
> > --- sys/dev/ic/bcmgenetvar.h 14 Apr 2020 21:02:39 -0000 1.1
> > +++ sys/dev/ic/bcmgenetvar.h 13 Jan 2026 09:12:25 -0000
> > @@ -51,7 +51,7 @@ struct genet_bufmap {
> > struct genet_ring {
> > bus_dma_tag_t buf_tag;
> > struct genet_bufmap buf_map[GENET_DMA_DESC_COUNT];
> > - u_int next, queued;
> > + u_int next;
> > uint32_t cidx, pidx;
> > };
> >
bse: mp-safe genet_qstart()