From: Alexander Bluhm Subject: Re: SoftLRO for ixl(4), bnxt(4) and em(4) To: Jan Klemkow Cc: tech@openbsd.org, Janne Johansson , Yuichiro NAITO Date: Thu, 20 Mar 2025 17:37:30 +0100 On Thu, Mar 20, 2025 at 03:20:25PM +0100, Alexander Bluhm wrote: > On Tue, Mar 04, 2025 at 08:02:31PM +0100, Jan Klemkow wrote: > > On Fri, Nov 15, 2024 at 11:30:08AM GMT, Jan Klemkow wrote: > > > On Thu, Nov 07, 2024 at 11:30:26AM GMT, David Gwynne wrote: > > > > On Thu, Nov 07, 2024 at 01:10:10AM +0100, Jan Klemkow wrote: > > > > > This diff introduces a software solution for TCP Large Receive Offload > > > > > (SoftLRO) for network interfaces don't hat hardware support for it. > > > > > This is needes at least for newer Intel interfaces as their > > > > > documentation said that LRO a.k.a. Receive Side Coalescing (RSC) has to > > > > > be done by software. > > > > > This diff coalesces TCP segments during the receive interrupt before > > > > > queueing them. Thus, our TCP/IP stack has to process less packet > > > > > headers per amount of received data. > > > > > > > > > > I measured receiving performance with Intel XXV710 25 GbE interfaces. > > > > > It increased from 6 Gbit/s to 23 Gbit/s. > > > > > > > > > > Even if we saturate em(4) without any of these technique its also part > > > > > this diff. I'm interested if this diff helps to reach 1 Gbit/s on old > > > > > or slow hardware. > > > > > > > > > > I also add bnxt(4) to this diff to increase test coverage. If you want > > > > > to tests this implementation with your favorite interface, just replace > > > > > the ml_enqueue() call with the new tcp_softlro_enqueue() (as seen > > > > > below). It should work with all kind network interfaces. > > > > > > > > > > Any comments and tests reports are welcome. > > > > > > > > nice. > > > > > > > > i would argue this should be ether_softlro_enqueue and put in > > > > if_ethersubr.c because it's specific to ethernet interfaces. we don't > > > > really have any other type of interface that bundles reception of > > > > packets that we can take advantage of like this, and internally it > > > > assumes it's pulling ethernet packets apart. > > > > > > > > aside from that, just a few comments on the code. > > > > > > I adapted your comments in the diff below. > > > > I refactored the SoftLRO diff. You just need to add the flags IFXF_LRO > > / IFCAP_LRO, and repalce ml_enqueue() with tcp_softlro_enqueue() to > > enable this on you favorit network device. > > > > Janne: I adjusted your diff with correct headers. But, I'm unable to > > test this part of the diff below, due to lack of hardware. Could you > > test it again? > > > > Yuichiro: Could you also retest your UDP/TCP forwarding test? I added a > > short path for non-TCP packets in the ixl(4) driver. Maybe its better > > now. > > > > Further tests and comments are welcome. > > As release lock will come soon, we should be careful. I think we > can add the code if we ensure that default behavior does not change > anything. > > We should start with ixl(4), ifconfig tcplro flag disabled per > default. Then we can improve the TCP code in tree and add other > drivers later. I found some issues like ethernet padding and > TCP option parsing that should be improved. But doing this > on top of an initial commit is easier. > > With some minor changes I am OK with commiting the diff below. > > - remove all drivers except ixl(4) from the diff > - turn off ifconfig tcplro per default > - make sure that ixl(4) logic does not change, if tcplro is off > - I have fixed the livelocked logic in ixl(4) > - I renamed the function to tcp_enqueue_lro(), it is more consistent > with the TSO functions. > > As said before, tcp_softlro() needs more love, but it want to > do this in tree. > > With my comments, jan@'s diff looks like this and would be OK bluhm@ Oops, I got a uvm fault in ixl_txeof() with this diff. Test was sending single stream TCP from Linux to Linux machine while OpenBSD was forwarding. ixl(4) LRO was activated. So it looks like it was sending a packet that was previously received with LRO. uvm_fault(0xffffffff82ab4180, 0xfffffd90d6fd4ff8, 0, 1) -> e kernel: page fault trap, code=0 Stopped at ixl_txeof+0x197: movl 0x8(%rdx,%rcx,1),%ecx TID PID UID PRFLAGS PFLAGS CPU COMMAND 314157 74541 0 0x14000 0x200 2 softnet3 455485 27135 0 0x14000 0x200 1 softnet2 432414 17394 0 0x14000 0x200 3 softnet1 ixl_txeof(ffff800000188000,ffff800000e1cf00) at ixl_txeof+0x197 ixl_intr_vector(ffff800000185300) at ixl_intr_vector+0x57 intr_handler(ffff800032c770c0,ffff800000175000) at intr_handler+0x91 Xintr_ioapic_edge28_untramp() at Xintr_ioapic_edge28_untramp+0x18f acpicpu_idle() at acpicpu_idle+0x239 sched_idle(ffffffff829f2ff0) at sched_idle+0x298 end trace frame: 0x0, count: 9 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> show panic *cpu0: uvm_fault(0xffffffff82ab4180, 0xfffffd90d6fd4ff8, 0, 1) -> e ddb{0}> trace ixl_txeof(ffff800000188000,ffff800000e1cf00) at ixl_txeof+0x197 ixl_intr_vector(ffff800000185300) at ixl_intr_vector+0x57 intr_handler(ffff800032c770c0,ffff800000175000) at intr_handler+0x91 Xintr_ioapic_edge28_untramp() at Xintr_ioapic_edge28_untramp+0x18f acpicpu_idle() at acpicpu_idle+0x239 sched_idle(ffffffff829f2ff0) at sched_idle+0x298 end trace frame: 0x0, count: -6 ddb{0}> show register rdi 0x4 rsi 0xffff800001f6e5c0 rbp 0xffff800032c77010 rbx 0xffff800001f6c000 rdx 0xfffffd80d6fd5000 rcx 0xffffffff0 rax 0x192 r8 0x8 r9 0 r10 0x2096f2a745fa867e r11 0x26076eed88feaba6 r12 0x196 r13 0x1 r14 0xffffffff r15 0x192 rip 0xffffffff81d8a047 ixl_txeof+0x197 cs 0x8 rflags 0x10206 __ALIGN_SIZE+0xf206 rsp 0xffff800032c76fa0 ss 0x10 ixl_txeof+0x197: movl 0x8(%rdx,%rcx,1),%ecx ddb{0}> ps PID TID PPID UID S FLAGS WAIT COMMAND 38518 412339 93610 0 3 0x10008a kqread ssh 47141 497841 69680 1000 3 0x100082 kqread tcpbench 69680 482468 41677 1000 3 0x10008a sigsusp sh 41677 137967 1 0 3 0x100088 sigsusp ksh 93610 431262 70721 0 3 0x82 piperd perl 70721 340494 84931 0 3 0x10008a sigsusp ksh 84931 281543 76631 0 3 0x98 kqread sshd-session 76631 384998 40558 0 3 0x92 kqread sshd-session 2806 202482 1 0 3 0x100083 ttyin getty 99278 67342 1 0 3 0x100098 kqread cron 14460 108200 1 99 3 0x1100090 kqread sndiod 49145 492411 1 110 3 0x100090 kqread sndiod 88648 284214 17435 95 3 0x1100092 kqread smtpd 85068 231497 17435 103 3 0x1100092 kqread smtpd 48744 446192 17435 95 3 0x1100092 kqread smtpd 81038 44933 17435 95 3 0x100092 kqread smtpd 67852 435382 17435 95 3 0x1100092 kqread smtpd 95510 326790 17435 95 3 0x1100092 kqread smtpd 17435 136031 1 0 3 0x100080 kqread smtpd 66494 299815 44745 91 3 0x92 kqread snmpd_metrics 2427 408672 44745 91 3 0x1100092 kqread snmpd 44745 351156 1 0 3 0x100080 kqread snmpd 40558 490400 1 0 3 0x88 kqread sshd 33095 460368 0 0 3 0x14200 acct acct 86928 490746 0 0 3 0x14280 nfsidl nfsio 40308 501410 0 0 3 0x14280 nfsidl nfsio 14683 369864 0 0 3 0x14280 nfsidl nfsio 86115 363991 0 0 3 0x14280 nfsidl nfsio 80631 145485 1 0 3 0x100080 kqread ntpd 62683 483071 70890 83 3 0x100092 kqread ntpd 70890 270623 1 83 3 0x1100092 kqread ntpd 59526 317744 53327 74 3 0x1100092 bpf pflogd 53327 166856 1 0 3 0x80 sbwait pflogd 46009 192808 60738 73 3 0x1100090 kqread syslogd 60738 257468 1 0 3 0x100082 sbwait syslogd 27089 12018 1 0 3 0x100080 kqread resolvd 34559 118884 58086 77 3 0x100092 kqread dhcpleased 50650 31850 58086 77 3 0x100092 kqread dhcpleased 58086 260708 1 0 3 0x80 kqread dhcpleased 57483 42427 93971 115 3 0x100092 kqread slaacd 37908 42219 93971 115 3 0x100092 kqread slaacd 93971 13019 1 0 3 0x100080 kqread slaacd 82672 132677 0 0 3 0x14200 bored wsdisplay1 85742 374184 0 0 3 0x14200 bored i915_flip 72724 322745 0 0 3 0x14200 bored i915_modeset 36100 66877 0 0 3 0x14200 bored card0-crtc2 71449 154528 0 0 3 0x14200 bored card0-crtc1 4004 449792 0 0 3 0x14200 bored card0-crtc0 66389 398290 0 0 3 0x14200 bored ttm 25629 195671 0 0 3 0x14200 bored i915-unordered 30993 3026 0 0 3 0x14200 bored i915-dp 64044 17892 0 0 3 0x14200 bored i915 50213 284154 0 0 3 0x14200 bored smr 40695 150715 0 0 3 0x14200 pgzero zerothread 33090 235644 0 0 3 0x14200 aiodoned aiodoned 89464 61329 0 0 3 0x14200 syncer update 90507 444068 0 0 3 0x14200 cleaner cleaner 60811 26863 0 0 3 0x14200 reaper reaper 23643 261108 0 0 3 0x14200 pgdaemon pagedaemon 48598 436260 0 0 3 0x14200 usbtsk usbtask 44951 160405 0 0 3 0x14200 usbatsk usbatsk 9085 470607 0 0 3 0x14200 bored drmtskl 51075 14158 0 0 3 0x14200 bored drmlwq 57977 332542 0 0 3 0x14200 bored drmlwq 31093 34074 0 0 3 0x14200 bored drmlwq 10444 303766 0 0 3 0x14200 bored drmlwq 63278 505564 0 0 3 0x14200 bored drmubwq 87295 374846 0 0 3 0x14200 bored drmubwq 13798 142371 0 0 3 0x14200 bored drmubwq 25664 191160 0 0 3 0x14200 bored drmubwq 1920 14491 0 0 3 0x14200 bored drmhpwq 32074 194715 0 0 3 0x14200 bored drmhpwq 87299 408596 0 0 3 0x14200 bored drmhpwq 84700 196185 0 0 3 0x14200 bored drmhpwq 49492 134485 0 0 3 0x14200 bored drmwq 85317 16007 0 0 3 0x14200 bored drmwq 69861 222376 0 0 3 0x14200 bored drmwq 58837 418320 0 0 3 0x14200 bored drmwq 92915 102135 0 0 3 0x40014200 acpi0 acpi0 62796 101256 0 0 3 0x40014200 idle3 98829 504539 0 0 3 0x40014200 idle2 74240 7414 0 0 3 0x40014200 idle1 52393 30619 0 0 3 0x14200 bored sensors 74541 314157 0 0 7 0x14200 softnet3 27135 455485 0 0 7 0x14200 softnet2 17394 432414 0 0 7 0x14200 softnet1 67812 482110 0 0 3 0x14200 bored softnet0 69613 421762 0 0 3 0x14200 bored systqmp 69474 244487 0 0 3 0x14200 bored systq 6630 315988 0 0 3 0x14200 tmoslp softclockmp 4182 461805 0 0 3 0x40014200 tmoslp softclock *98424 232380 0 0 7 0x40014200 idle0 1 288021 0 0 3 0x82 wait init 0 0 -1 0 3 0x10200 scheduler swapper ddb{0}> x/s version version: OpenBSD 7.7-beta (GENERIC.MP) #cvs : D2025.03.19.00.00.00: Thu Mar 20 17:06:11 CET 2025\012 root@ot41.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP\012 Crash is here: /home/bluhm/openbsd/cvs/src/sys/dev/pci/if_ixl.c:3032 7a3c: 4c 89 f1 mov %r14,%rcx 7a3f: 48 c1 e1 04 shl $0x4,%rcx 7a43: 48 8b 55 98 mov 0xffffffffffffff98(%rbp),%rdx 7a47: 8b 4c 0a 08 mov 0x8(%rdx,%rcx,1),%ecx /home/bluhm/openbsd/cvs/src/sys/dev/pci/if_ixl.c:3033 3027 do { 3028 txm = &txr->txr_maps[cons]; 3029 last = txm->txm_eop; 3030 txd = &ring[last]; 3031 * 3032 dtype = txd->cmd & htole64(IXL_TX_DESC_DTYPE_MASK); 3033 if (dtype != htole64(IXL_TX_DESC_DTYPE_DONE)) 3034 break; Variable last is -1. My guess is concatenating the TCP packet with LRO has some bugs. Then an illegal packet causes the TSO send path to crash. As we disable LRO per default for now, I don't consider this crash as a show stopper. bluhm