Index | Thread | Search

From:
Visa Hankala <visa@hankala.org>
Subject:
Re: octeon: commuliative patch LRO, cnmac queue and softens
To:
"Kirill A. Korinsky" <kirill@korins.ky>
Cc:
tech@openbsd.org
Date:
Sun, 5 Apr 2026 11:52:19 +0000

Download raw body.

Thread
On Sun, Apr 05, 2026 at 11:35:44AM +0200, Kirill A. Korinsky wrote:
> On Fri, 03 Apr 2026 16:28:34 +0200,
> Visa Hankala <visa@hankala.org> wrote:
> > 
> > > @@ -108,22 +110,30 @@ cn30xxpip_port_config(struct cn30xxpip_s
> > >  	/* SKIP=0 */
> > >  
> > >  	prt_tag = 0;
> > > +	SET(prt_tag, PIP_PRT_TAGN_INC_VLAN);
> > >  	SET(prt_tag, PIP_PRT_TAGN_INC_PRT);
> > 
> > I wonder if VLAN id and input port number should be left out from
> > the packet tag. This would make the tag symmetric with regards to IP
> > addresses and TCP/UDP ports, and let the same CPU core handle both
> > directions of TCP/UDP flows. This might improve CPU cache locality
> > and performance when forwarding multiple flows. Of course, the
> > symmetricity is lost if packets are transformed for example by NAT
> > or tunneling.
> >
> 
> Not sure that I get the idea of symmetric. Right now it uses SRC and DST
> addresses and ports, and for an opposite dirrection it should have reversed
> addresses and ports, isn't it?

By symmetric hash/tag I mean that
hash(saddr, sport, daddr, dport) = hash(daddr, dport, saddr, sport) .

> Also, VLAN id and input port number is here to make differen tag for cases:
>  - routing traffic between vlans on the same port
>  - and routing traffic between ports on the same vlan

In my opinion both directions of a flow should be processed by the same
CPU core when possible. This should improve scaling in terms of total
throughput because the state tracking data do not need to go back and
forth between CPU cores; faster access, less contention.

> Numbers for iperf in single thread when two machines are in different vlan
> but on the same cnmac:

I think multi-queue processing should be benchmarked with multiple
flows. Also, I believe it is more common to forward traffic between
different ports.

With a single flow and four cores, you are leaving processing capacity
unused.

> 1) -current:
> 
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec   324 MBytes   272 Mbits/sec    0             sender
> [  5]   0.00-10.01  sec   324 MBytes   272 Mbits/sec                  receiver
> 
> 2) without VLAN and port number:
> 
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec   397 MBytes   333 Mbits/sec    0             sender
> [  5]   0.00-10.01  sec   396 MBytes   332 Mbits/sec                  receiver
> 
> 3) with VLAN and port number:
> 
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec   442 MBytes   371 Mbits/sec    0             sender
> [  5]   0.00-10.00  sec   442 MBytes   371 Mbits/sec                  receiver
> 
> and in the case (3) I see that two softnets are active
> 
> -- 
> wbr, Kirill
>