From: Alexander Bluhm Subject: Re: spread 8 network interrupt over cpu and softnet To: Jonathan Matthew Cc: tech@openbsd.org Date: Tue, 11 Nov 2025 12:20:27 +0100 I would like to commit the global IF_MAX_VECTORS limit. Then it is easier for me to find optimal values for number of softnet threads and interface queues. ok? bluhm On Mon, Oct 27, 2025 at 05:27:59PM +0100, Alexander Bluhm wrote: > On Mon, Oct 13, 2025 at 02:04:03PM +1000, Jonathan Matthew wrote: > > On Tue, Oct 07, 2025 at 02:44:24PM +0200, Alexander Bluhm wrote: > > > On Tue, Oct 07, 2025 at 05:33:35PM +1000, Jonathan Matthew wrote: > > > > On Sat, Sep 27, 2025 at 06:36:50PM +0200, Alexander Bluhm wrote: > > > > > Hi, > > > > > > > > > > Currently most network drivers use 8 queues and distribute interrupts. > > > > > ix(4) is the only one that may allocate more than 8. ice(8) uses > > > > > only 8 vectors, but allocates more interrupts. > > > > > > > > > > I would like to limit interrupts and queues to 8 for all drivers. > > > > > This prevents running out of interrupt vectors so easily. Currently > > > > > we have 8 softnet threads which limits parallel processing anyway. > > > > > We can tune these values later. > > > > > > > > > > With this diff, forwarding througput increases from 45 to 50 GBit/sec > > > > > on my 12 core machine from ice0 to ice1. I am using iperf3 TCP on > > > > > Linux to measure it. I think slower performance happend because > > > > > ice(4) was allocating more interrupts than it has vectors. Now > > > > > everything is limited to 8. For other drivers I see no difference > > > > > as they operate at line speed anyway. > > > > > > > > > > I think arbitrary numbers IXL_MAX_VECTORS, IGC_MAX_VECTORS and > > > > > IGC_MAX_VECTORS could go away, but that would be per driver diffs. > > > > > > > > I think the changes to bnxt, igc and ixl, where you're just picking the > > > > smaller of two constants, neither of which reflects an actual limit on > > > > the number of queues available, should be removed from the diff. > > > > We've mostly settled on 8 as the maximum number of queues to use anyway, > > > > so this doesn't really change anything. > > > > > > Are there any hardware limits? As I want to play with IF_MAX_VECTORS > > > globally, it would be nice to know the capabilites of the hardware. > > > That's why I wanted to address this on a per driver basis. > > > > > > > > igc(4) says > > > #define IGC_MAX_VECTORS 8 > > > without explanation. Is this an actual hardware limit? > > > > The I225/226 datasheet says the hardware is limited to 4 queues. > > On the only igc hardware I have, the msi-x table has 5 entries, > > so we could never use 8 queues there anyway. > > > > > > > > ix has these lines > > > /* XXX the number of queues is limited to what we can keep stats on */ > > > maxq = (sc->hw.mac.type == ixgbe_mac_82598EB) ? 8 : 16; > > > > > > Are the 8 and 16 restrictions of the hardware? IF_MAX_VECTORS > > > should be the value for optimal system behavior. On top each driver > > > has its own limitations. That's how I came to the minimum calculation. > > > > The limit here is the number of stats registers. ix hardware has 64 > > tx/rx queues, but enough registers for reading statistics off all of them. > > > > Other cases aside from ice: > > bnxt - nic firmware gives us limits on the number of msi-x vectors and tx/rx > > queues, but we currently ignore them and assume 8 will be available. > > > > ixl - the datasheet says we can create up to 1536 tx/rx queue pairs, and > > the number of msi-x vectors available is given in the pci capability > > structure, which we already take into account. > > > > iavf - limited to 4 queues according to the virtual function interface > > specification. > > > > mcx - nic firmware gives us limits on the number of send/receive queues, > > completion queues and event queues, the lowest of which we should use as > > the maximum number of queues, but we currently just assume we can use 16 > > queues. > > > > aq - register layout only has space for 8 queues as far as I can tell, > > so 8 is a hardware limit. Seems to have a bigger msi-x table though. > > > > vmx - seems to be limited to 8 tx and 16 rx queues, so we use 8 at most. > > > > ngbe - datasheet says 8 tx and rx queues. > > Thanks for extracting all theses details from the data sheet. > > My idea is to give each driver its hardware limit and then calculate > the minimum with a global limit. So we get somewhat consistent > values and can find the optimum setting in the future. > > I have tested it with bnxt, ice, igc, ix, ixl. > > For aq, iavf, mcx, ngbe, vmx I have no hardware currently in my > setup. Except for mcx the values do not change with my diff. > > Is this diff a way we can move forward? > > bluhm > > Index: dev/pci/if_aq_pci.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_aq_pci.c,v > diff -u -p -r1.32 if_aq_pci.c > --- dev/pci/if_aq_pci.c 29 Jun 2025 19:32:08 -0000 1.32 > +++ dev/pci/if_aq_pci.c 27 Oct 2025 15:11:56 -0000 > @@ -1310,8 +1310,8 @@ aq_attach(struct device *parent, struct > int nmsix = pci_intr_msix_count(pa); > if (nmsix > 1) { > nmsix--; > - sc->sc_intrmap = intrmap_create(&sc->sc_dev, > - nmsix, AQ_MAXQ, INTRMAP_POWEROF2); > + sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix, > + MIN(AQ_MAXQ, IF_MAX_VECTORS), INTRMAP_POWEROF2); > sc->sc_nqueues = intrmap_count(sc->sc_intrmap); > KASSERT(sc->sc_nqueues > 0); > KASSERT(powerof2(sc->sc_nqueues)); > Index: dev/pci/if_bnxt.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_bnxt.c,v > diff -u -p -r1.59 if_bnxt.c > --- dev/pci/if_bnxt.c 13 Oct 2025 10:45:08 -0000 1.59 > +++ dev/pci/if_bnxt.c 27 Oct 2025 15:11:56 -0000 > @@ -545,9 +545,11 @@ bnxt_attach(struct device *parent, struc > nmsix = pci_intr_msix_count(pa); > if (nmsix > 1) { > sc->sc_ih = pci_intr_establish(sc->sc_pc, ih, > - IPL_NET | IPL_MPSAFE, bnxt_admin_intr, sc, DEVNAME(sc)); > - sc->sc_intrmap = intrmap_create(&sc->sc_dev, > - nmsix - 1, BNXT_MAX_QUEUES, INTRMAP_POWEROF2); > + IPL_NET | IPL_MPSAFE, bnxt_admin_intr, sc, > + DEVNAME(sc)); > + sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix - 1, > + MIN(BNXT_MAX_QUEUES, IF_MAX_VECTORS), > + INTRMAP_POWEROF2); > sc->sc_nqueues = intrmap_count(sc->sc_intrmap); > KASSERT(sc->sc_nqueues > 0); > KASSERT(powerof2(sc->sc_nqueues)); > Index: dev/pci/if_iavf.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_iavf.c,v > diff -u -p -r1.25 if_iavf.c > --- dev/pci/if_iavf.c 24 Jun 2025 10:59:15 -0000 1.25 > +++ dev/pci/if_iavf.c 27 Oct 2025 15:11:56 -0000 > @@ -1034,8 +1034,9 @@ iavf_attach(struct device *parent, struc > if (nmsix > 1) { /* we used 1 (the 0th) for the adminq */ > nmsix--; > > - sc->sc_intrmap = intrmap_create(&sc->sc_dev, > - nmsix, IAVF_MAX_VECTORS, INTRMAP_POWEROF2); > + sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix, > + MIN(IAVF_MAX_VECTORS, IF_MAX_VECTORS), > + INTRMAP_POWEROF2); > nqueues = intrmap_count(sc->sc_intrmap); > KASSERT(nqueues > 0); > KASSERT(powerof2(nqueues)); > Index: dev/pci/if_ice.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ice.c,v > diff -u -p -r1.63 if_ice.c > --- dev/pci/if_ice.c 10 Oct 2025 11:58:24 -0000 1.63 > +++ dev/pci/if_ice.c 27 Oct 2025 15:11:56 -0000 > @@ -30537,9 +30537,10 @@ ice_attach_hook(struct device *self) > goto deinit_hw; > } > sc->sc_nmsix = nmsix; > - nqueues_max = MIN(sc->isc_nrxqsets_max, sc->isc_ntxqsets_max); > + nqueues_max = MIN(MIN(sc->isc_nrxqsets_max, sc->isc_ntxqsets_max), > + ICE_MAX_VECTORS); > sc->sc_intrmap = intrmap_create(&sc->sc_dev, sc->sc_nmsix - 1, > - nqueues_max, INTRMAP_POWEROF2); > + MIN(nqueues_max, IF_MAX_VECTORS), INTRMAP_POWEROF2); > nqueues = intrmap_count(sc->sc_intrmap); > KASSERT(nqueues > 0); > KASSERT(powerof2(nqueues)); > Index: dev/pci/if_igc.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.c,v > diff -u -p -r1.28 if_igc.c > --- dev/pci/if_igc.c 24 Jun 2025 11:00:27 -0000 1.28 > +++ dev/pci/if_igc.c 27 Oct 2025 15:11:56 -0000 > @@ -724,8 +724,8 @@ igc_setup_msix(struct igc_softc *sc) > /* Give one vector to events. */ > nmsix--; > > - sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix, IGC_MAX_VECTORS, > - INTRMAP_POWEROF2); > + sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix, > + MIN(IGC_MAX_VECTORS, IF_MAX_VECTORS), INTRMAP_POWEROF2); > sc->sc_nqueues = intrmap_count(sc->sc_intrmap); > } > > Index: dev/pci/if_igc.h > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.h,v > diff -u -p -r1.4 if_igc.h > --- dev/pci/if_igc.h 21 May 2024 11:19:39 -0000 1.4 > +++ dev/pci/if_igc.h 27 Oct 2025 15:11:56 -0000 > @@ -174,7 +174,7 @@ > > #define IGC_PCIREG PCI_MAPREG_START > > -#define IGC_MAX_VECTORS 8 > +#define IGC_MAX_VECTORS 4 > > /* Enable/disable debugging statements in shared code */ > #define DBG 0 > Index: dev/pci/if_ix.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ix.c,v > diff -u -p -r1.221 if_ix.c > --- dev/pci/if_ix.c 24 Jun 2025 11:02:03 -0000 1.221 > +++ dev/pci/if_ix.c 27 Oct 2025 15:11:56 -0000 > @@ -37,6 +37,8 @@ > #include > #include > > +#define IX_MAX_VECTORS 64 > + > /* > * Our TCP/IP Stack is unable to handle packets greater than MAXMCLBYTES. > * This interface is unable to handle packets greater than IXGBE_TSO_SIZE. > @@ -1851,10 +1853,12 @@ ixgbe_setup_msix(struct ix_softc *sc) > /* give one vector to events */ > nmsix--; > > + maxq = IX_MAX_VECTORS; > /* XXX the number of queues is limited to what we can keep stats on */ > - maxq = (sc->hw.mac.type == ixgbe_mac_82598EB) ? 8 : 16; > - > - sc->sc_intrmap = intrmap_create(&sc->dev, nmsix, maxq, 0); > + if (sc->hw.mac.type == ixgbe_mac_82598EB) > + maxq = 8; > + sc->sc_intrmap = intrmap_create(&sc->dev, nmsix, > + MIN(maxq, IF_MAX_VECTORS), 0); > sc->num_queues = intrmap_count(sc->sc_intrmap); > } > > Index: dev/pci/if_ixl.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ixl.c,v > diff -u -p -r1.110 if_ixl.c > --- dev/pci/if_ixl.c 11 Oct 2025 18:34:24 -0000 1.110 > +++ dev/pci/if_ixl.c 27 Oct 2025 15:11:56 -0000 > @@ -100,7 +100,7 @@ > #define CACHE_LINE_SIZE 64 > #endif > > -#define IXL_MAX_VECTORS 8 /* XXX this is pretty arbitrary */ > +#define IXL_MAX_VECTORS 1536 > > #define I40E_MASK(mask, shift) ((mask) << (shift)) > #define I40E_PF_RESET_WAIT_COUNT 200 > @@ -1780,8 +1780,9 @@ ixl_attach(struct device *parent, struct > if (nmsix > 1) { /* we used 1 (the 0th) for the adminq */ > nmsix--; > > - sc->sc_intrmap = intrmap_create(&sc->sc_dev, > - nmsix, IXL_MAX_VECTORS, INTRMAP_POWEROF2); > + sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix, > + MIN(IXL_MAX_VECTORS, IF_MAX_VECTORS), > + INTRMAP_POWEROF2); > nqueues = intrmap_count(sc->sc_intrmap); > KASSERT(nqueues > 0); > KASSERT(powerof2(nqueues)); > Index: dev/pci/if_mcx.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_mcx.c,v > diff -u -p -r1.119 if_mcx.c > --- dev/pci/if_mcx.c 5 Mar 2025 06:44:02 -0000 1.119 > +++ dev/pci/if_mcx.c 27 Oct 2025 15:11:56 -0000 > @@ -2933,8 +2933,8 @@ mcx_attach(struct device *parent, struct > } > > msix--; /* admin ops took one */ > - sc->sc_intrmap = intrmap_create(&sc->sc_dev, msix, MCX_MAX_QUEUES, > - INTRMAP_POWEROF2); > + sc->sc_intrmap = intrmap_create(&sc->sc_dev, msix, > + MIN(MCX_MAX_QUEUES, IF_MAX_VECTORS), INTRMAP_POWEROF2); > if (sc->sc_intrmap == NULL) { > printf(": unable to create interrupt map\n"); > goto teardown; > Index: dev/pci/if_ngbe.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ngbe.c,v > diff -u -p -r1.7 if_ngbe.c > --- dev/pci/if_ngbe.c 24 Jun 2025 11:04:15 -0000 1.7 > +++ dev/pci/if_ngbe.c 27 Oct 2025 15:11:56 -0000 > @@ -1074,8 +1074,8 @@ ngbe_setup_msix(struct ngbe_softc *sc) > /* Give one vector to events. */ > nmsix--; > > - sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix, NGBE_MAX_VECTORS, > - INTRMAP_POWEROF2); > + sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix, > + MIN(NGBE_MAX_VECTORS, IF_MAX_VECTORS), INTRMAP_POWEROF2); > sc->sc_nqueues = intrmap_count(sc->sc_intrmap); > > return 0; > Index: dev/pci/if_vmx.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_vmx.c,v > diff -u -p -r1.93 if_vmx.c > --- dev/pci/if_vmx.c 19 Jun 2025 09:36:21 -0000 1.93 > +++ dev/pci/if_vmx.c 27 Oct 2025 15:11:56 -0000 > @@ -317,7 +317,8 @@ vmxnet3_attach(struct device *parent, st > > isr = vmxnet3_intr_event; > sc->sc_intrmap = intrmap_create(&sc->sc_dev, > - msix, VMX_MAX_QUEUES, INTRMAP_POWEROF2); > + msix, MIN(VMX_MAX_QUEUES, IF_MAX_VECTORS), > + INTRMAP_POWEROF2); > sc->sc_nqueues = intrmap_count(sc->sc_intrmap); > } > break; > Index: net/if.h > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.h,v > diff -u -p -r1.221 if.h > --- net/if.h 9 Sep 2025 09:16:18 -0000 1.221 > +++ net/if.h 27 Oct 2025 15:11:56 -0000 > @@ -526,6 +526,9 @@ struct if_sffpage { > #include > > #ifdef _KERNEL > + > +#define IF_MAX_VECTORS 8 > + > struct socket; > struct ifnet; > struct ifq_ops;