Download raw body.
SoftLRO for ixl(4), bnxt(4) and em(4)
On Thu, Mar 20, 2025 at 03:20:25PM +0100, Alexander Bluhm wrote:
> On Tue, Mar 04, 2025 at 08:02:31PM +0100, Jan Klemkow wrote:
> > On Fri, Nov 15, 2024 at 11:30:08AM GMT, Jan Klemkow wrote:
> > > On Thu, Nov 07, 2024 at 11:30:26AM GMT, David Gwynne wrote:
> > > > On Thu, Nov 07, 2024 at 01:10:10AM +0100, Jan Klemkow wrote:
> > > > > This diff introduces a software solution for TCP Large Receive Offload
> > > > > (SoftLRO) for network interfaces don't hat hardware support for it.
> > > > > This is needes at least for newer Intel interfaces as their
> > > > > documentation said that LRO a.k.a. Receive Side Coalescing (RSC) has to
> > > > > be done by software.
> > > > > This diff coalesces TCP segments during the receive interrupt before
> > > > > queueing them. Thus, our TCP/IP stack has to process less packet
> > > > > headers per amount of received data.
> > > > >
> > > > > I measured receiving performance with Intel XXV710 25 GbE interfaces.
> > > > > It increased from 6 Gbit/s to 23 Gbit/s.
> > > > >
> > > > > Even if we saturate em(4) without any of these technique its also part
> > > > > this diff. I'm interested if this diff helps to reach 1 Gbit/s on old
> > > > > or slow hardware.
> > > > >
> > > > > I also add bnxt(4) to this diff to increase test coverage. If you want
> > > > > to tests this implementation with your favorite interface, just replace
> > > > > the ml_enqueue() call with the new tcp_softlro_enqueue() (as seen
> > > > > below). It should work with all kind network interfaces.
> > > > >
> > > > > Any comments and tests reports are welcome.
> > > >
> > > > nice.
> > > >
> > > > i would argue this should be ether_softlro_enqueue and put in
> > > > if_ethersubr.c because it's specific to ethernet interfaces. we don't
> > > > really have any other type of interface that bundles reception of
> > > > packets that we can take advantage of like this, and internally it
> > > > assumes it's pulling ethernet packets apart.
> > > >
> > > > aside from that, just a few comments on the code.
> > >
> > > I adapted your comments in the diff below.
> >
> > I refactored the SoftLRO diff. You just need to add the flags IFXF_LRO
> > / IFCAP_LRO, and repalce ml_enqueue() with tcp_softlro_enqueue() to
> > enable this on you favorit network device.
> >
> > Janne: I adjusted your diff with correct headers. But, I'm unable to
> > test this part of the diff below, due to lack of hardware. Could you
> > test it again?
> >
> > Yuichiro: Could you also retest your UDP/TCP forwarding test? I added a
> > short path for non-TCP packets in the ixl(4) driver. Maybe its better
> > now.
> >
> > Further tests and comments are welcome.
>
> As release lock will come soon, we should be careful. I think we
> can add the code if we ensure that default behavior does not change
> anything.
>
> We should start with ixl(4), ifconfig tcplro flag disabled per
> default. Then we can improve the TCP code in tree and add other
> drivers later. I found some issues like ethernet padding and
> TCP option parsing that should be improved. But doing this
> on top of an initial commit is easier.
>
> With some minor changes I am OK with commiting the diff below.
>
> - remove all drivers except ixl(4) from the diff
> - turn off ifconfig tcplro per default
> - make sure that ixl(4) logic does not change, if tcplro is off
> - I have fixed the livelocked logic in ixl(4)
> - I renamed the function to tcp_enqueue_lro(), it is more consistent
> with the TSO functions.
>
> As said before, tcp_softlro() needs more love, but it want to
> do this in tree.
>
> With my comments, jan@'s diff looks like this and would be OK bluhm@
Oops, I got a uvm fault in ixl_txeof() with this diff. Test was
sending single stream TCP from Linux to Linux machine while OpenBSD
was forwarding. ixl(4) LRO was activated. So it looks like it was
sending a packet that was previously received with LRO.
uvm_fault(0xffffffff82ab4180, 0xfffffd90d6fd4ff8, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at ixl_txeof+0x197: movl 0x8(%rdx,%rcx,1),%ecx
TID PID UID PRFLAGS PFLAGS CPU COMMAND
314157 74541 0 0x14000 0x200 2 softnet3
455485 27135 0 0x14000 0x200 1 softnet2
432414 17394 0 0x14000 0x200 3 softnet1
ixl_txeof(ffff800000188000,ffff800000e1cf00) at ixl_txeof+0x197
ixl_intr_vector(ffff800000185300) at ixl_intr_vector+0x57
intr_handler(ffff800032c770c0,ffff800000175000) at intr_handler+0x91
Xintr_ioapic_edge28_untramp() at Xintr_ioapic_edge28_untramp+0x18f
acpicpu_idle() at acpicpu_idle+0x239
sched_idle(ffffffff829f2ff0) at sched_idle+0x298
end trace frame: 0x0, count: 9
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb{0}> show panic
*cpu0: uvm_fault(0xffffffff82ab4180, 0xfffffd90d6fd4ff8, 0, 1) -> e
ddb{0}> trace
ixl_txeof(ffff800000188000,ffff800000e1cf00) at ixl_txeof+0x197
ixl_intr_vector(ffff800000185300) at ixl_intr_vector+0x57
intr_handler(ffff800032c770c0,ffff800000175000) at intr_handler+0x91
Xintr_ioapic_edge28_untramp() at Xintr_ioapic_edge28_untramp+0x18f
acpicpu_idle() at acpicpu_idle+0x239
sched_idle(ffffffff829f2ff0) at sched_idle+0x298
end trace frame: 0x0, count: -6
ddb{0}> show register
rdi 0x4
rsi 0xffff800001f6e5c0
rbp 0xffff800032c77010
rbx 0xffff800001f6c000
rdx 0xfffffd80d6fd5000
rcx 0xffffffff0
rax 0x192
r8 0x8
r9 0
r10 0x2096f2a745fa867e
r11 0x26076eed88feaba6
r12 0x196
r13 0x1
r14 0xffffffff
r15 0x192
rip 0xffffffff81d8a047 ixl_txeof+0x197
cs 0x8
rflags 0x10206 __ALIGN_SIZE+0xf206
rsp 0xffff800032c76fa0
ss 0x10
ixl_txeof+0x197: movl 0x8(%rdx,%rcx,1),%ecx
ddb{0}> ps
PID TID PPID UID S FLAGS WAIT COMMAND
38518 412339 93610 0 3 0x10008a kqread ssh
47141 497841 69680 1000 3 0x100082 kqread tcpbench
69680 482468 41677 1000 3 0x10008a sigsusp sh
41677 137967 1 0 3 0x100088 sigsusp ksh
93610 431262 70721 0 3 0x82 piperd perl
70721 340494 84931 0 3 0x10008a sigsusp ksh
84931 281543 76631 0 3 0x98 kqread sshd-session
76631 384998 40558 0 3 0x92 kqread sshd-session
2806 202482 1 0 3 0x100083 ttyin getty
99278 67342 1 0 3 0x100098 kqread cron
14460 108200 1 99 3 0x1100090 kqread sndiod
49145 492411 1 110 3 0x100090 kqread sndiod
88648 284214 17435 95 3 0x1100092 kqread smtpd
85068 231497 17435 103 3 0x1100092 kqread smtpd
48744 446192 17435 95 3 0x1100092 kqread smtpd
81038 44933 17435 95 3 0x100092 kqread smtpd
67852 435382 17435 95 3 0x1100092 kqread smtpd
95510 326790 17435 95 3 0x1100092 kqread smtpd
17435 136031 1 0 3 0x100080 kqread smtpd
66494 299815 44745 91 3 0x92 kqread snmpd_metrics
2427 408672 44745 91 3 0x1100092 kqread snmpd
44745 351156 1 0 3 0x100080 kqread snmpd
40558 490400 1 0 3 0x88 kqread sshd
33095 460368 0 0 3 0x14200 acct acct
86928 490746 0 0 3 0x14280 nfsidl nfsio
40308 501410 0 0 3 0x14280 nfsidl nfsio
14683 369864 0 0 3 0x14280 nfsidl nfsio
86115 363991 0 0 3 0x14280 nfsidl nfsio
80631 145485 1 0 3 0x100080 kqread ntpd
62683 483071 70890 83 3 0x100092 kqread ntpd
70890 270623 1 83 3 0x1100092 kqread ntpd
59526 317744 53327 74 3 0x1100092 bpf pflogd
53327 166856 1 0 3 0x80 sbwait pflogd
46009 192808 60738 73 3 0x1100090 kqread syslogd
60738 257468 1 0 3 0x100082 sbwait syslogd
27089 12018 1 0 3 0x100080 kqread resolvd
34559 118884 58086 77 3 0x100092 kqread dhcpleased
50650 31850 58086 77 3 0x100092 kqread dhcpleased
58086 260708 1 0 3 0x80 kqread dhcpleased
57483 42427 93971 115 3 0x100092 kqread slaacd
37908 42219 93971 115 3 0x100092 kqread slaacd
93971 13019 1 0 3 0x100080 kqread slaacd
82672 132677 0 0 3 0x14200 bored wsdisplay1
85742 374184 0 0 3 0x14200 bored i915_flip
72724 322745 0 0 3 0x14200 bored i915_modeset
36100 66877 0 0 3 0x14200 bored card0-crtc2
71449 154528 0 0 3 0x14200 bored card0-crtc1
4004 449792 0 0 3 0x14200 bored card0-crtc0
66389 398290 0 0 3 0x14200 bored ttm
25629 195671 0 0 3 0x14200 bored i915-unordered
30993 3026 0 0 3 0x14200 bored i915-dp
64044 17892 0 0 3 0x14200 bored i915
50213 284154 0 0 3 0x14200 bored smr
40695 150715 0 0 3 0x14200 pgzero zerothread
33090 235644 0 0 3 0x14200 aiodoned aiodoned
89464 61329 0 0 3 0x14200 syncer update
90507 444068 0 0 3 0x14200 cleaner cleaner
60811 26863 0 0 3 0x14200 reaper reaper
23643 261108 0 0 3 0x14200 pgdaemon pagedaemon
48598 436260 0 0 3 0x14200 usbtsk usbtask
44951 160405 0 0 3 0x14200 usbatsk usbatsk
9085 470607 0 0 3 0x14200 bored drmtskl
51075 14158 0 0 3 0x14200 bored drmlwq
57977 332542 0 0 3 0x14200 bored drmlwq
31093 34074 0 0 3 0x14200 bored drmlwq
10444 303766 0 0 3 0x14200 bored drmlwq
63278 505564 0 0 3 0x14200 bored drmubwq
87295 374846 0 0 3 0x14200 bored drmubwq
13798 142371 0 0 3 0x14200 bored drmubwq
25664 191160 0 0 3 0x14200 bored drmubwq
1920 14491 0 0 3 0x14200 bored drmhpwq
32074 194715 0 0 3 0x14200 bored drmhpwq
87299 408596 0 0 3 0x14200 bored drmhpwq
84700 196185 0 0 3 0x14200 bored drmhpwq
49492 134485 0 0 3 0x14200 bored drmwq
85317 16007 0 0 3 0x14200 bored drmwq
69861 222376 0 0 3 0x14200 bored drmwq
58837 418320 0 0 3 0x14200 bored drmwq
92915 102135 0 0 3 0x40014200 acpi0 acpi0
62796 101256 0 0 3 0x40014200 idle3
98829 504539 0 0 3 0x40014200 idle2
74240 7414 0 0 3 0x40014200 idle1
52393 30619 0 0 3 0x14200 bored sensors
74541 314157 0 0 7 0x14200 softnet3
27135 455485 0 0 7 0x14200 softnet2
17394 432414 0 0 7 0x14200 softnet1
67812 482110 0 0 3 0x14200 bored softnet0
69613 421762 0 0 3 0x14200 bored systqmp
69474 244487 0 0 3 0x14200 bored systq
6630 315988 0 0 3 0x14200 tmoslp softclockmp
4182 461805 0 0 3 0x40014200 tmoslp softclock
*98424 232380 0 0 7 0x40014200 idle0
1 288021 0 0 3 0x82 wait init
0 0 -1 0 3 0x10200 scheduler swapper
ddb{0}> x/s version
version: OpenBSD 7.7-beta (GENERIC.MP) #cvs : D2025.03.19.00.00.00: Thu Mar 20 17:06:11 CET 2025\012 root@ot41.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP\012
Crash is here:
/home/bluhm/openbsd/cvs/src/sys/dev/pci/if_ixl.c:3032
7a3c: 4c 89 f1 mov %r14,%rcx
7a3f: 48 c1 e1 04 shl $0x4,%rcx
7a43: 48 8b 55 98 mov 0xffffffffffffff98(%rbp),%rdx
7a47: 8b 4c 0a 08 mov 0x8(%rdx,%rcx,1),%ecx
/home/bluhm/openbsd/cvs/src/sys/dev/pci/if_ixl.c:3033
3027 do {
3028 txm = &txr->txr_maps[cons];
3029 last = txm->txm_eop;
3030 txd = &ring[last];
3031
* 3032 dtype = txd->cmd & htole64(IXL_TX_DESC_DTYPE_MASK);
3033 if (dtype != htole64(IXL_TX_DESC_DTYPE_DONE))
3034 break;
Variable last is -1.
My guess is concatenating the TCP packet with LRO has some bugs.
Then an illegal packet causes the TSO send path to crash.
As we disable LRO per default for now, I don't consider this crash
as a show stopper.
bluhm
SoftLRO for ixl(4), bnxt(4) and em(4)