Index | Thread | Search

From:
Alexander Bluhm <bluhm@openbsd.org>
Subject:
Re: SoftLRO for ixl(4), bnxt(4) and em(4)
To:
Jan Klemkow <jan@openbsd.org>
Cc:
tech@openbsd.org, Janne Johansson <icepic.dz@gmail.com>, Yuichiro NAITO <naito.yuichiro@gmail.com>
Date:
Thu, 20 Mar 2025 17:37:30 +0100

Download raw body.

Thread
On Thu, Mar 20, 2025 at 03:20:25PM +0100, Alexander Bluhm wrote:
> On Tue, Mar 04, 2025 at 08:02:31PM +0100, Jan Klemkow wrote:
> > On Fri, Nov 15, 2024 at 11:30:08AM GMT, Jan Klemkow wrote:
> > > On Thu, Nov 07, 2024 at 11:30:26AM GMT, David Gwynne wrote:
> > > > On Thu, Nov 07, 2024 at 01:10:10AM +0100, Jan Klemkow wrote:
> > > > > This diff introduces a software solution for TCP Large Receive Offload
> > > > > (SoftLRO) for network interfaces don't hat hardware support for it.
> > > > > This is needes at least for newer Intel interfaces as their
> > > > > documentation said that LRO a.k.a. Receive Side Coalescing (RSC) has to
> > > > > be done by software.
> > > > > This diff coalesces TCP segments during the receive interrupt before
> > > > > queueing them.  Thus, our TCP/IP stack has to process less packet
> > > > > headers per amount of received data.
> > > > >
> > > > > I measured receiving performance with Intel XXV710 25 GbE interfaces.
> > > > > It increased from 6 Gbit/s to 23 Gbit/s.
> > > > >
> > > > > Even if we saturate em(4) without any of these technique its also part
> > > > > this diff.  I'm interested if this diff helps to reach 1 Gbit/s on old
> > > > > or slow hardware.
> > > > >
> > > > > I also add bnxt(4) to this diff to increase test coverage.  If you want
> > > > > to tests this implementation with your favorite interface, just replace
> > > > > the ml_enqueue() call with the new tcp_softlro_enqueue() (as seen
> > > > > below).  It should work with all kind network interfaces.
> > > > >
> > > > > Any comments and tests reports are welcome.
> > > >
> > > > nice.
> > > >
> > > > i would argue this should be ether_softlro_enqueue and put in
> > > > if_ethersubr.c because it's specific to ethernet interfaces. we don't
> > > > really have any other type of interface that bundles reception of
> > > > packets that we can take advantage of like this, and internally it
> > > > assumes it's pulling ethernet packets apart.
> > > >
> > > > aside from that, just a few comments on the code.
> > >
> > > I adapted your comments in the diff below.
> >
> > I refactored the SoftLRO diff.  You just need to add the flags IFXF_LRO
> > / IFCAP_LRO, and repalce ml_enqueue() with tcp_softlro_enqueue() to
> > enable this on you favorit network device.
> >
> > Janne: I adjusted your diff with correct headers.  But, I'm unable to
> > test this part of the diff below, due to lack of hardware.  Could you
> > test it again?
> >
> > Yuichiro: Could you also retest your UDP/TCP forwarding test?  I added a
> > short path for non-TCP packets in the ixl(4) driver.  Maybe its better
> > now.
> >
> > Further tests and comments are welcome.
> 
> As release lock will come soon, we should be careful.  I think we
> can add the code if we ensure that default behavior does not change
> anything.
> 
> We should start with ixl(4), ifconfig tcplro flag disabled per
> default.  Then we can improve the TCP code in tree and add other
> drivers later.  I found some issues like ethernet padding and
> TCP option parsing that should be improved.  But doing this
> on top of an initial commit is easier.
> 
> With some minor changes I am OK with commiting the diff below.
> 
> - remove all drivers except ixl(4) from the diff
> - turn off ifconfig tcplro per default
> - make sure that ixl(4) logic does not change, if tcplro is off
> - I have fixed the livelocked logic in ixl(4)
> - I renamed the function to tcp_enqueue_lro(), it is more consistent
>   with the TSO functions.
> 
> As said before, tcp_softlro() needs more love, but it want to
> do this in tree.
> 
> With my comments, jan@'s diff looks like this and would be OK bluhm@

Oops, I got a uvm fault in ixl_txeof() with this diff.  Test was
sending single stream TCP from Linux to Linux machine while OpenBSD
was forwarding.  ixl(4) LRO was activated.  So it looks like it was
sending a packet that was previously received with LRO.

uvm_fault(0xffffffff82ab4180, 0xfffffd90d6fd4ff8, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at      ixl_txeof+0x197:        movl    0x8(%rdx,%rcx,1),%ecx
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
 314157  74541      0     0x14000      0x200    2  softnet3
 455485  27135      0     0x14000      0x200    1  softnet2
 432414  17394      0     0x14000      0x200    3  softnet1
ixl_txeof(ffff800000188000,ffff800000e1cf00) at ixl_txeof+0x197
ixl_intr_vector(ffff800000185300) at ixl_intr_vector+0x57
intr_handler(ffff800032c770c0,ffff800000175000) at intr_handler+0x91
Xintr_ioapic_edge28_untramp() at Xintr_ioapic_edge28_untramp+0x18f
acpicpu_idle() at acpicpu_idle+0x239
sched_idle(ffffffff829f2ff0) at sched_idle+0x298
end trace frame: 0x0, count: 9
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{0}> show panic
*cpu0: uvm_fault(0xffffffff82ab4180, 0xfffffd90d6fd4ff8, 0, 1) -> e

ddb{0}> trace
ixl_txeof(ffff800000188000,ffff800000e1cf00) at ixl_txeof+0x197
ixl_intr_vector(ffff800000185300) at ixl_intr_vector+0x57
intr_handler(ffff800032c770c0,ffff800000175000) at intr_handler+0x91
Xintr_ioapic_edge28_untramp() at Xintr_ioapic_edge28_untramp+0x18f
acpicpu_idle() at acpicpu_idle+0x239
sched_idle(ffffffff829f2ff0) at sched_idle+0x298
end trace frame: 0x0, count: -6

ddb{0}> show register
rdi                              0x4
rsi               0xffff800001f6e5c0
rbp               0xffff800032c77010
rbx               0xffff800001f6c000
rdx               0xfffffd80d6fd5000
rcx                      0xffffffff0
rax                            0x192
r8                               0x8
r9                                 0
r10               0x2096f2a745fa867e
r11               0x26076eed88feaba6
r12                            0x196
r13                              0x1
r14                       0xffffffff
r15                            0x192
rip               0xffffffff81d8a047    ixl_txeof+0x197
cs                               0x8
rflags                       0x10206    __ALIGN_SIZE+0xf206
rsp               0xffff800032c76fa0
ss                              0x10
ixl_txeof+0x197:        movl    0x8(%rdx,%rcx,1),%ecx

ddb{0}> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 38518  412339  93610      0  3    0x10008a  kqread        ssh
 47141  497841  69680   1000  3    0x100082  kqread        tcpbench
 69680  482468  41677   1000  3    0x10008a  sigsusp       sh
 41677  137967      1      0  3    0x100088  sigsusp       ksh
 93610  431262  70721      0  3        0x82  piperd        perl
 70721  340494  84931      0  3    0x10008a  sigsusp       ksh
 84931  281543  76631      0  3        0x98  kqread        sshd-session
 76631  384998  40558      0  3        0x92  kqread        sshd-session
  2806  202482      1      0  3    0x100083  ttyin         getty
 99278   67342      1      0  3    0x100098  kqread        cron
 14460  108200      1     99  3   0x1100090  kqread        sndiod
 49145  492411      1    110  3    0x100090  kqread        sndiod
 88648  284214  17435     95  3   0x1100092  kqread        smtpd
 85068  231497  17435    103  3   0x1100092  kqread        smtpd
 48744  446192  17435     95  3   0x1100092  kqread        smtpd
 81038   44933  17435     95  3    0x100092  kqread        smtpd
 67852  435382  17435     95  3   0x1100092  kqread        smtpd
 95510  326790  17435     95  3   0x1100092  kqread        smtpd
 17435  136031      1      0  3    0x100080  kqread        smtpd
 66494  299815  44745     91  3        0x92  kqread        snmpd_metrics
  2427  408672  44745     91  3   0x1100092  kqread        snmpd
 44745  351156      1      0  3    0x100080  kqread        snmpd
 40558  490400      1      0  3        0x88  kqread        sshd
 33095  460368      0      0  3     0x14200  acct          acct
 86928  490746      0      0  3     0x14280  nfsidl        nfsio
 40308  501410      0      0  3     0x14280  nfsidl        nfsio
 14683  369864      0      0  3     0x14280  nfsidl        nfsio
 86115  363991      0      0  3     0x14280  nfsidl        nfsio
 80631  145485      1      0  3    0x100080  kqread        ntpd
 62683  483071  70890     83  3    0x100092  kqread        ntpd
 70890  270623      1     83  3   0x1100092  kqread        ntpd
 59526  317744  53327     74  3   0x1100092  bpf           pflogd
 53327  166856      1      0  3        0x80  sbwait        pflogd
 46009  192808  60738     73  3   0x1100090  kqread        syslogd
 60738  257468      1      0  3    0x100082  sbwait        syslogd
 27089   12018      1      0  3    0x100080  kqread        resolvd
 34559  118884  58086     77  3    0x100092  kqread        dhcpleased
 50650   31850  58086     77  3    0x100092  kqread        dhcpleased
 58086  260708      1      0  3        0x80  kqread        dhcpleased
 57483   42427  93971    115  3    0x100092  kqread        slaacd
 37908   42219  93971    115  3    0x100092  kqread        slaacd
 93971   13019      1      0  3    0x100080  kqread        slaacd
 82672  132677      0      0  3     0x14200  bored         wsdisplay1
 85742  374184      0      0  3     0x14200  bored         i915_flip
 72724  322745      0      0  3     0x14200  bored         i915_modeset
 36100   66877      0      0  3     0x14200  bored         card0-crtc2
 71449  154528      0      0  3     0x14200  bored         card0-crtc1
  4004  449792      0      0  3     0x14200  bored         card0-crtc0
 66389  398290      0      0  3     0x14200  bored         ttm
 25629  195671      0      0  3     0x14200  bored         i915-unordered
 30993    3026      0      0  3     0x14200  bored         i915-dp
 64044   17892      0      0  3     0x14200  bored         i915
 50213  284154      0      0  3     0x14200  bored         smr
 40695  150715      0      0  3     0x14200  pgzero        zerothread
 33090  235644      0      0  3     0x14200  aiodoned      aiodoned
 89464   61329      0      0  3     0x14200  syncer        update
 90507  444068      0      0  3     0x14200  cleaner       cleaner
 60811   26863      0      0  3     0x14200  reaper        reaper
 23643  261108      0      0  3     0x14200  pgdaemon      pagedaemon
 48598  436260      0      0  3     0x14200  usbtsk        usbtask
 44951  160405      0      0  3     0x14200  usbatsk       usbatsk
  9085  470607      0      0  3     0x14200  bored         drmtskl
 51075   14158      0      0  3     0x14200  bored         drmlwq
 57977  332542      0      0  3     0x14200  bored         drmlwq
 31093   34074      0      0  3     0x14200  bored         drmlwq
 10444  303766      0      0  3     0x14200  bored         drmlwq
 63278  505564      0      0  3     0x14200  bored         drmubwq
 87295  374846      0      0  3     0x14200  bored         drmubwq
 13798  142371      0      0  3     0x14200  bored         drmubwq
 25664  191160      0      0  3     0x14200  bored         drmubwq
  1920   14491      0      0  3     0x14200  bored         drmhpwq
 32074  194715      0      0  3     0x14200  bored         drmhpwq
 87299  408596      0      0  3     0x14200  bored         drmhpwq
 84700  196185      0      0  3     0x14200  bored         drmhpwq
 49492  134485      0      0  3     0x14200  bored         drmwq
 85317   16007      0      0  3     0x14200  bored         drmwq
 69861  222376      0      0  3     0x14200  bored         drmwq
 58837  418320      0      0  3     0x14200  bored         drmwq
 92915  102135      0      0  3  0x40014200  acpi0         acpi0
 62796  101256      0      0  3  0x40014200                idle3
 98829  504539      0      0  3  0x40014200                idle2
 74240    7414      0      0  3  0x40014200                idle1
 52393   30619      0      0  3     0x14200  bored         sensors
 74541  314157      0      0  7     0x14200                softnet3
 27135  455485      0      0  7     0x14200                softnet2
 17394  432414      0      0  7     0x14200                softnet1
 67812  482110      0      0  3     0x14200  bored         softnet0
 69613  421762      0      0  3     0x14200  bored         systqmp
 69474  244487      0      0  3     0x14200  bored         systq
  6630  315988      0      0  3     0x14200  tmoslp        softclockmp
  4182  461805      0      0  3  0x40014200  tmoslp        softclock
*98424  232380      0      0  7  0x40014200                idle0
     1  288021      0      0  3        0x82  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper

ddb{0}> x/s version
version:        OpenBSD 7.7-beta (GENERIC.MP) #cvs : D2025.03.19.00.00.00: Thu Mar 20 17:06:11 CET 2025\012    root@ot41.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP\012

Crash is here:
/home/bluhm/openbsd/cvs/src/sys/dev/pci/if_ixl.c:3032
    7a3c:       4c 89 f1                mov    %r14,%rcx
    7a3f:       48 c1 e1 04             shl    $0x4,%rcx
    7a43:       48 8b 55 98             mov    0xffffffffffffff98(%rbp),%rdx
    7a47:       8b 4c 0a 08             mov    0x8(%rdx,%rcx,1),%ecx
/home/bluhm/openbsd/cvs/src/sys/dev/pci/if_ixl.c:3033

  3027          do {
  3028                  txm = &txr->txr_maps[cons];
  3029                  last = txm->txm_eop;
  3030                  txd = &ring[last];
  3031
* 3032                  dtype = txd->cmd & htole64(IXL_TX_DESC_DTYPE_MASK);
  3033                  if (dtype != htole64(IXL_TX_DESC_DTYPE_DONE))
  3034                          break;

Variable last is -1.

My guess is concatenating the TCP packet with LRO has some bugs.
Then an illegal packet causes the TSO send path to crash.

As we disable LRO per default for now, I don't consider this crash
as a show stopper.

bluhm