Index | Thread | Search

From:
Mark Kettenis <mark.kettenis@xs4all.nl>
Subject:
Re: spread 8 network interrupt over cpu and softnet
To:
Alexander Bluhm <bluhm@openbsd.org>
Cc:
jonathan@d14n.org, tech@openbsd.org
Date:
Tue, 11 Nov 2025 17:13:01 +0100

Download raw body.

Thread
> Date: Tue, 11 Nov 2025 12:20:27 +0100
> From: Alexander Bluhm <bluhm@openbsd.org>
> 
> I would like to commit the global IF_MAX_VECTORS limit.  Then it
> is easier for me to find optimal values for number of softnet threads
> and interface queues.
> 
> ok?

ok kettenis@

> On Mon, Oct 27, 2025 at 05:27:59PM +0100, Alexander Bluhm wrote:
> > On Mon, Oct 13, 2025 at 02:04:03PM +1000, Jonathan Matthew wrote:
> > > On Tue, Oct 07, 2025 at 02:44:24PM +0200, Alexander Bluhm wrote:
> > > > On Tue, Oct 07, 2025 at 05:33:35PM +1000, Jonathan Matthew wrote:
> > > > > On Sat, Sep 27, 2025 at 06:36:50PM +0200, Alexander Bluhm wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > Currently most network drivers use 8 queues and distribute interrupts.
> > > > > > ix(4) is the only one that may allocate more than 8.  ice(8) uses
> > > > > > only 8 vectors, but allocates more interrupts.
> > > > > > 
> > > > > > I would like to limit interrupts and queues to 8 for all drivers.
> > > > > > This prevents running out of interrupt vectors so easily.  Currently
> > > > > > we have 8 softnet threads which limits parallel processing anyway.
> > > > > > We can tune these values later.
> > > > > > 
> > > > > > With this diff, forwarding througput increases from 45 to 50 GBit/sec
> > > > > > on my 12 core machine from ice0 to ice1.  I am using iperf3 TCP on
> > > > > > Linux to measure it.  I think slower performance happend because
> > > > > > ice(4) was allocating more interrupts than it has vectors.  Now
> > > > > > everything is limited to 8.  For other drivers I see no difference
> > > > > > as they operate at line speed anyway.
> > > > > > 
> > > > > > I think arbitrary numbers IXL_MAX_VECTORS, IGC_MAX_VECTORS and
> > > > > > IGC_MAX_VECTORS could go away, but that would be per driver diffs.
> > > > > 
> > > > > I think the changes to bnxt, igc and ixl, where you're just picking the
> > > > > smaller of two constants, neither of which reflects an actual limit on
> > > > > the number of queues available, should be removed from the diff.
> > > > > We've mostly settled on 8 as the maximum number of queues to use anyway,
> > > > > so this doesn't really change anything.
> > > > 
> > > > Are there any hardware limits?  As I want to play with IF_MAX_VECTORS
> > > > globally, it would be nice to know the capabilites of the hardware.
> > > > That's why I wanted to address this on a per driver basis.
> > > 
> > > > 
> > > > igc(4) says
> > > > #define IGC_MAX_VECTORS                8
> > > > without explanation.  Is this an actual hardware limit?
> > > 
> > > The I225/226 datasheet says the hardware is limited to 4 queues.
> > > On the only igc hardware I have, the msi-x table has 5 entries,
> > > so we could never use 8 queues there anyway.
> > > 
> > > > 
> > > > ix has these lines
> > > >         /* XXX the number of queues is limited to what we can keep stats on */
> > > >         maxq = (sc->hw.mac.type == ixgbe_mac_82598EB) ? 8 : 16;
> > > > 
> > > > Are the 8 and 16 restrictions of the hardware?  IF_MAX_VECTORS
> > > > should be the value for optimal system behavior.  On top each driver
> > > > has its own limitations.  That's how I came to the minimum calculation.
> > > 
> > > The limit here is the number of stats registers.  ix hardware has 64
> > > tx/rx queues, but enough registers for reading statistics off all of them.
> > > 
> > > Other cases aside from ice:
> > > bnxt - nic firmware gives us limits on the number of msi-x vectors and tx/rx
> > > queues, but we currently ignore them and assume 8 will be available.
> > > 
> > > ixl - the datasheet says we can create up to 1536 tx/rx queue pairs, and
> > > the number of msi-x vectors available is given in the pci capability
> > > structure, which we already take into account.
> > > 
> > > iavf - limited to 4 queues according to the virtual function interface
> > > specification.
> > > 
> > > mcx - nic firmware gives us limits on the number of send/receive queues,
> > > completion queues and event queues, the lowest of which we should use as
> > > the maximum number of queues, but we currently just assume we can use 16
> > > queues.
> > > 
> > > aq - register layout only has space for 8 queues as far as I can tell,
> > > so 8 is a hardware limit.  Seems to have a bigger msi-x table though.
> > > 
> > > vmx - seems to be limited to 8 tx and 16 rx queues, so we use 8 at most.
> > > 
> > > ngbe - datasheet says 8 tx and rx queues.
> > 
> > Thanks for extracting all theses details from the data sheet.
> > 
> > My idea is to give each driver its hardware limit and then calculate
> > the minimum with a global limit.  So we get somewhat consistent
> > values and can find the optimum setting in the future.
> > 
> > I have tested it with bnxt, ice, igc, ix, ixl.
> > 
> > For aq, iavf, mcx, ngbe, vmx I have no hardware currently in my
> > setup.  Except for mcx the values do not change with my diff.
> > 
> > Is this diff a way we can move forward?
> > 
> > bluhm
> > 
> > Index: dev/pci/if_aq_pci.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_aq_pci.c,v
> > diff -u -p -r1.32 if_aq_pci.c
> > --- dev/pci/if_aq_pci.c	29 Jun 2025 19:32:08 -0000	1.32
> > +++ dev/pci/if_aq_pci.c	27 Oct 2025 15:11:56 -0000
> > @@ -1310,8 +1310,8 @@ aq_attach(struct device *parent, struct 
> >  		int nmsix = pci_intr_msix_count(pa);
> >  		if (nmsix > 1) {
> >  			nmsix--;
> > -			sc->sc_intrmap = intrmap_create(&sc->sc_dev,
> > -			    nmsix, AQ_MAXQ, INTRMAP_POWEROF2);
> > +			sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix,
> > +			    MIN(AQ_MAXQ, IF_MAX_VECTORS), INTRMAP_POWEROF2);
> >  			sc->sc_nqueues = intrmap_count(sc->sc_intrmap);
> >  			KASSERT(sc->sc_nqueues > 0);
> >  			KASSERT(powerof2(sc->sc_nqueues));
> > Index: dev/pci/if_bnxt.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_bnxt.c,v
> > diff -u -p -r1.59 if_bnxt.c
> > --- dev/pci/if_bnxt.c	13 Oct 2025 10:45:08 -0000	1.59
> > +++ dev/pci/if_bnxt.c	27 Oct 2025 15:11:56 -0000
> > @@ -545,9 +545,11 @@ bnxt_attach(struct device *parent, struc
> >  		nmsix = pci_intr_msix_count(pa);
> >  		if (nmsix > 1) {
> >  			sc->sc_ih = pci_intr_establish(sc->sc_pc, ih,
> > -			    IPL_NET | IPL_MPSAFE, bnxt_admin_intr, sc, DEVNAME(sc));
> > -			sc->sc_intrmap = intrmap_create(&sc->sc_dev,
> > -			    nmsix - 1, BNXT_MAX_QUEUES, INTRMAP_POWEROF2);
> > +			    IPL_NET | IPL_MPSAFE, bnxt_admin_intr, sc,
> > +			    DEVNAME(sc));
> > +			sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix - 1,
> > +			    MIN(BNXT_MAX_QUEUES, IF_MAX_VECTORS),
> > +			    INTRMAP_POWEROF2);
> >  			sc->sc_nqueues = intrmap_count(sc->sc_intrmap);
> >  			KASSERT(sc->sc_nqueues > 0);
> >  			KASSERT(powerof2(sc->sc_nqueues));
> > Index: dev/pci/if_iavf.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_iavf.c,v
> > diff -u -p -r1.25 if_iavf.c
> > --- dev/pci/if_iavf.c	24 Jun 2025 10:59:15 -0000	1.25
> > +++ dev/pci/if_iavf.c	27 Oct 2025 15:11:56 -0000
> > @@ -1034,8 +1034,9 @@ iavf_attach(struct device *parent, struc
> >  		if (nmsix > 1) { /* we used 1 (the 0th) for the adminq */
> >  			nmsix--;
> >  
> > -			sc->sc_intrmap = intrmap_create(&sc->sc_dev,
> > -			    nmsix, IAVF_MAX_VECTORS, INTRMAP_POWEROF2);
> > +			sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix,
> > +			    MIN(IAVF_MAX_VECTORS, IF_MAX_VECTORS),
> > +			    INTRMAP_POWEROF2);
> >  			nqueues = intrmap_count(sc->sc_intrmap);
> >  			KASSERT(nqueues > 0);
> >  			KASSERT(powerof2(nqueues));
> > Index: dev/pci/if_ice.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ice.c,v
> > diff -u -p -r1.63 if_ice.c
> > --- dev/pci/if_ice.c	10 Oct 2025 11:58:24 -0000	1.63
> > +++ dev/pci/if_ice.c	27 Oct 2025 15:11:56 -0000
> > @@ -30537,9 +30537,10 @@ ice_attach_hook(struct device *self)
> >  		goto deinit_hw;
> >  	}
> >  	sc->sc_nmsix = nmsix;
> > -	nqueues_max = MIN(sc->isc_nrxqsets_max, sc->isc_ntxqsets_max);
> > +	nqueues_max = MIN(MIN(sc->isc_nrxqsets_max, sc->isc_ntxqsets_max),
> > +	    ICE_MAX_VECTORS);
> >  	sc->sc_intrmap = intrmap_create(&sc->sc_dev, sc->sc_nmsix - 1,
> > -	    nqueues_max, INTRMAP_POWEROF2);
> > +	    MIN(nqueues_max, IF_MAX_VECTORS), INTRMAP_POWEROF2);
> >  	nqueues = intrmap_count(sc->sc_intrmap);
> >  	KASSERT(nqueues > 0);
> >  	KASSERT(powerof2(nqueues));
> > Index: dev/pci/if_igc.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.c,v
> > diff -u -p -r1.28 if_igc.c
> > --- dev/pci/if_igc.c	24 Jun 2025 11:00:27 -0000	1.28
> > +++ dev/pci/if_igc.c	27 Oct 2025 15:11:56 -0000
> > @@ -724,8 +724,8 @@ igc_setup_msix(struct igc_softc *sc)
> >  	/* Give one vector to events. */
> >  	nmsix--;
> >  
> > -	sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix, IGC_MAX_VECTORS,
> > -	    INTRMAP_POWEROF2);
> > +	sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix,
> > +	    MIN(IGC_MAX_VECTORS, IF_MAX_VECTORS), INTRMAP_POWEROF2);
> >  	sc->sc_nqueues = intrmap_count(sc->sc_intrmap);
> >  }
> >  
> > Index: dev/pci/if_igc.h
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_igc.h,v
> > diff -u -p -r1.4 if_igc.h
> > --- dev/pci/if_igc.h	21 May 2024 11:19:39 -0000	1.4
> > +++ dev/pci/if_igc.h	27 Oct 2025 15:11:56 -0000
> > @@ -174,7 +174,7 @@
> >  
> >  #define IGC_PCIREG		PCI_MAPREG_START
> >  
> > -#define IGC_MAX_VECTORS		8
> > +#define IGC_MAX_VECTORS		4
> >  
> >  /* Enable/disable debugging statements in shared code */
> >  #define DBG	0
> > Index: dev/pci/if_ix.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ix.c,v
> > diff -u -p -r1.221 if_ix.c
> > --- dev/pci/if_ix.c	24 Jun 2025 11:02:03 -0000	1.221
> > +++ dev/pci/if_ix.c	27 Oct 2025 15:11:56 -0000
> > @@ -37,6 +37,8 @@
> >  #include <dev/pci/if_ix.h>
> >  #include <dev/pci/ixgbe_type.h>
> >  
> > +#define IX_MAX_VECTORS			64
> > +
> >  /*
> >   * Our TCP/IP Stack is unable to handle packets greater than MAXMCLBYTES.
> >   * This interface is unable to handle packets greater than IXGBE_TSO_SIZE.
> > @@ -1851,10 +1853,12 @@ ixgbe_setup_msix(struct ix_softc *sc)
> >  	/* give one vector to events */
> >  	nmsix--;
> >  
> > +	maxq = IX_MAX_VECTORS;
> >  	/* XXX the number of queues is limited to what we can keep stats on */
> > -	maxq = (sc->hw.mac.type == ixgbe_mac_82598EB) ? 8 : 16;
> > -
> > -	sc->sc_intrmap = intrmap_create(&sc->dev, nmsix, maxq, 0);
> > +	if (sc->hw.mac.type == ixgbe_mac_82598EB)
> > +		maxq = 8;
> > +	sc->sc_intrmap = intrmap_create(&sc->dev, nmsix,
> > +	    MIN(maxq, IF_MAX_VECTORS), 0);
> >  	sc->num_queues = intrmap_count(sc->sc_intrmap);
> >  }
> >  
> > Index: dev/pci/if_ixl.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ixl.c,v
> > diff -u -p -r1.110 if_ixl.c
> > --- dev/pci/if_ixl.c	11 Oct 2025 18:34:24 -0000	1.110
> > +++ dev/pci/if_ixl.c	27 Oct 2025 15:11:56 -0000
> > @@ -100,7 +100,7 @@
> >  #define CACHE_LINE_SIZE 64
> >  #endif
> >  
> > -#define IXL_MAX_VECTORS			8 /* XXX this is pretty arbitrary */
> > +#define IXL_MAX_VECTORS			1536
> >  
> >  #define I40E_MASK(mask, shift)		((mask) << (shift))
> >  #define I40E_PF_RESET_WAIT_COUNT	200
> > @@ -1780,8 +1780,9 @@ ixl_attach(struct device *parent, struct
> >  		if (nmsix > 1) { /* we used 1 (the 0th) for the adminq */
> >  			nmsix--;
> >  
> > -			sc->sc_intrmap = intrmap_create(&sc->sc_dev,
> > -			    nmsix, IXL_MAX_VECTORS, INTRMAP_POWEROF2);
> > +			sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix,
> > +			    MIN(IXL_MAX_VECTORS, IF_MAX_VECTORS),
> > +			    INTRMAP_POWEROF2);
> >  			nqueues = intrmap_count(sc->sc_intrmap);
> >  			KASSERT(nqueues > 0);
> >  			KASSERT(powerof2(nqueues));
> > Index: dev/pci/if_mcx.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_mcx.c,v
> > diff -u -p -r1.119 if_mcx.c
> > --- dev/pci/if_mcx.c	5 Mar 2025 06:44:02 -0000	1.119
> > +++ dev/pci/if_mcx.c	27 Oct 2025 15:11:56 -0000
> > @@ -2933,8 +2933,8 @@ mcx_attach(struct device *parent, struct
> >  	}
> >  
> >  	msix--; /* admin ops took one */
> > -	sc->sc_intrmap = intrmap_create(&sc->sc_dev, msix, MCX_MAX_QUEUES,
> > -	    INTRMAP_POWEROF2);
> > +	sc->sc_intrmap = intrmap_create(&sc->sc_dev, msix,
> > +	    MIN(MCX_MAX_QUEUES, IF_MAX_VECTORS), INTRMAP_POWEROF2);
> >  	if (sc->sc_intrmap == NULL) {
> >  		printf(": unable to create interrupt map\n");
> >  		goto teardown;
> > Index: dev/pci/if_ngbe.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ngbe.c,v
> > diff -u -p -r1.7 if_ngbe.c
> > --- dev/pci/if_ngbe.c	24 Jun 2025 11:04:15 -0000	1.7
> > +++ dev/pci/if_ngbe.c	27 Oct 2025 15:11:56 -0000
> > @@ -1074,8 +1074,8 @@ ngbe_setup_msix(struct ngbe_softc *sc)
> >  	/* Give one vector to events. */
> >  	nmsix--;
> >  
> > -	sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix, NGBE_MAX_VECTORS,
> > -	    INTRMAP_POWEROF2);
> > +	sc->sc_intrmap = intrmap_create(&sc->sc_dev, nmsix,
> > +	    MIN(NGBE_MAX_VECTORS, IF_MAX_VECTORS), INTRMAP_POWEROF2);
> >  	sc->sc_nqueues = intrmap_count(sc->sc_intrmap);
> >  
> >  	return 0;
> > Index: dev/pci/if_vmx.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_vmx.c,v
> > diff -u -p -r1.93 if_vmx.c
> > --- dev/pci/if_vmx.c	19 Jun 2025 09:36:21 -0000	1.93
> > +++ dev/pci/if_vmx.c	27 Oct 2025 15:11:56 -0000
> > @@ -317,7 +317,8 @@ vmxnet3_attach(struct device *parent, st
> >  
> >  				isr = vmxnet3_intr_event;
> >  				sc->sc_intrmap = intrmap_create(&sc->sc_dev,
> > -				    msix, VMX_MAX_QUEUES, INTRMAP_POWEROF2);
> > +				    msix, MIN(VMX_MAX_QUEUES, IF_MAX_VECTORS),
> > +				    INTRMAP_POWEROF2);
> >  				sc->sc_nqueues = intrmap_count(sc->sc_intrmap);
> >  			}
> >  			break;
> > Index: net/if.h
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.h,v
> > diff -u -p -r1.221 if.h
> > --- net/if.h	9 Sep 2025 09:16:18 -0000	1.221
> > +++ net/if.h	27 Oct 2025 15:11:56 -0000
> > @@ -526,6 +526,9 @@ struct if_sffpage {
> >  #include <net/if_arp.h>
> >  
> >  #ifdef _KERNEL
> > +
> > +#define IF_MAX_VECTORS		8
> > +
> >  struct socket;
> >  struct ifnet;
> >  struct ifq_ops;
> 
>