Index | Thread | Search

From:
Lucas Gabriel Vuotto <lucas@sexy.is>
Subject:
Re: SoftLRO for ixl(4), bnxt(4) and em(4)
To:
Jan Klemkow <jan@openbsd.org>
Cc:
tech@openbsd.org
Date:
Fri, 8 Nov 2024 17:03:52 +0000

Download raw body.

Thread
  • Lucas Gabriel Vuotto:

    SoftLRO for ixl(4), bnxt(4) and em(4)

  • Janne Johansson:

    SoftLRO for ixl(4), bnxt(4) and em(4)

  • On Thu, Nov 07, 2024 at 01:10:10AM +0100, Jan Klemkow wrote:
    > Hi,
    > 
    > This diff introduces a software solution for TCP Large Receive Offload
    > (SoftLRO) for network interfaces don't hat hardware support for it.
    > This is needes at least for newer Intel interfaces as their
    > documentation said that LRO a.k.a. Receive Side Coalescing (RSC) has to
    > be done by software.
    > This diff coalesces TCP segments during the receive interrupt before
    > queueing them.  Thus, our TCP/IP stack has to process less packet
    > headers per amount of received data.
    > 
    > I measured receiving performance with Intel XXV710 25 GbE interfaces.
    > It increased from 6 Gbit/s to 23 Gbit/s.
    > 
    > Even if we saturate em(4) without any of these technique its also part
    > this diff.  I'm interested if this diff helps to reach 1 Gbit/s on old
    > or slow hardware.
    
    APU6 go brrrr
    
    obsd-apu6b4-a BEFORE -----------------------------------------------------
    
    IPv4
    client	bandwidth min/avg/max/std-dev = 487.156/497.768/503.871/3.134 Mbps
    server	bandwidth min/avg/max/std-dev = 568.683/589.442/594.683/3.503 Mbps
    
    IPv6
    client	bandwidth min/avg/max/std-dev = 471.821/477.839/483.937/2.166 Mbps
    server	bandwidth min/avg/max/std-dev = 523.542/539.876/546.900/3.649 Mbps
    
    obsd-apu6b4-a AFTER
    
    IPv4
    client	bandwidth min/avg/max/std-dev = 780.729/934.348/937.991/20.019 Mbps
    server	bandwidth min/avg/max/std-dev = 932.727/941.426/942.467/1.140 Mbps
    
    IPv6
    client	bandwidth min/avg/max/std-dev = 928.371/928.761/931.901/0.494 Mbps
    server	bandwidth min/avg/max/std-dev = 920.312/928.495/929.461/1.104 Mbps
    
    obsd-apu6b4-b BEFORE -----------------------------------------------------
    
    IPv4
    client	bandwidth min/avg/max/std-dev = 576.758/589.498/595.304/2.531 Mbps
    server	bandwidth min/avg/max/std-dev = 480.936/497.648/502.814/3.923 Mbps
    
    IPv6
    client	bandwidth min/avg/max/std-dev = 531.915/539.952/546.428/3.009 Mbps
    server	bandwidth min/avg/max/std-dev = 466.355/477.812/482.661/2.429 Mbps
    
    obsd-apu6b4-b AFTER
    
    IPv4
    client	bandwidth min/avg/max/std-dev = 941.409/941.669/942.764/0.300 Mbps
    server	bandwidth min/avg/max/std-dev = 788.386/938.256/942.455/19.652 Mbps
    
    IPv6
    client	bandwidth min/avg/max/std-dev = 925.024/928.644/930.074/0.657 Mbps
    server	bandwidth min/avg/max/std-dev = 921.186/928.626/929.449/1.022 Mbps
    
    Full dmesg of obsd-apu6b4-a follows. The test was performed point to
    point over em3.
    
    obsd-apu6b4-a
    em0 at pci1 dev 0 function 0 "Intel I210 Fiber" rev 0x03: msi, address 00:0d:b9:63:2a:e4
    em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:63:2a:e5
    em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:63:2a:e6
    em3 at pci4 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:63:2a:e7
    
    obsd-apu6b4-6
    em0 at pci1 dev 0 function 0 "Intel I210 Fiber" rev 0x03: msi, address 00:0d:b9:63:26:54
    em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:63:26:55
    em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:63:26:56
    em3 at pci4 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:63:26:57
    
    OpenBSD 7.6-current (GENERIC.MP) #1: Fri Nov  8 16:08:26 UTC 2024
        lucas@obsd-apu6b4-a.satsfy.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
    real mem = 4259872768 (4062MB)
    avail mem = 4107530240 (3917MB)
    random: good seed from bootblocks
    mpath0 at root
    scsibus0 at mpath0: 256 targets
    mainbus0 at root
    bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xcfe8b040 (13 entries)
    bios0: vendor coreboot version "v4.12.0.5" date 09/25/2020
    bios0: PC Engines apu6
    acpi0 at bios0: ACPI 6.0
    acpi0: sleep states S0 S1 S4 S5
    acpi0: tables DSDT FACP SSDT MCFG TPM2 APIC HEST SSDT SSDT DRTM HPET
    acpi0: wakeup devices PBR4(S4) PBR5(S4) PBR6(S4) PBR7(S4) PBR8(S4) UOH1(S3) UOH2(S3) UOH3(S3) UOH4(S3) UOH5(S3) UOH6(S3) XHC0(S4)
    acpitimer0 at acpi0: 3579545 Hz, 32 bits
    acpimcfg0 at acpi0
    acpimcfg0: addr 0xf8000000, bus 0-64
    acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
    cpu0 at mainbus0: apid 0 (boot processor)
    cpu0: AMD GX-412TC SOC, 998.20 MHz, 16-30-01, patch 07030105
    cpu0: cpuid 1 edx=178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT> ecx=36d8220b<SSE3,PCLMUL,MWAIT,SSSE3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C>
    cpu0: cpuid 6 eax=4<ARAT> ecx=1<EFFFREQ>
    cpu0: cpuid 7.0 ebx=8<BMI1>
    cpu0: cpuid d.1 eax=1<XSAVEOPT>
    cpu0: cpuid 80000001 edx=2fd3fbff<NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG> ecx=1d4037ff<LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TOPEXT,DBKP,PERFTSC,PCTRL3>
    cpu0: cpuid 80000007 edx=33d9<HWPSTATE,ITSC>
    cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 2-way I-cache, 2MB 64b/line 16-way L2 cache
    cpu0: smt 0, core 0, package 0
    mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
    cpu0: apic clock running at 99MHz
    cpu0: mwait min=64, max=64, IBE
    cpu1 at mainbus0: apid 1 (application processor)
    cpu1: AMD GX-412TC SOC, 998.24 MHz, 16-30-01, patch 07030105
    cpu1: smt 0, core 1, package 0
    cpu2 at mainbus0: apid 2 (application processor)
    cpu2: AMD GX-412TC SOC, 998.29 MHz, 16-30-01, patch 07030105
    cpu2: smt 0, core 2, package 0
    cpu3 at mainbus0: apid 3 (application processor)
    cpu3: AMD GX-412TC SOC, 998.41 MHz, 16-30-01, patch 07030105
    cpu3: smt 0, core 3, package 0
    ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 21, 24 pins
    ioapic1 at mainbus0: apid 5 pa 0xfec20000, version 21, 32 pins
    acpihpet0 at acpi0: 14318180 Hz
    acpiprt0 at acpi0: bus 0 (PCI0)
    acpiprt1 at acpi0: bus 1 (PBR4)
    acpiprt2 at acpi0: bus 2 (PBR5)
    acpiprt3 at acpi0: bus 3 (PBR6)
    acpiprt4 at acpi0: bus 4 (PBR7)
    acpiprt5 at acpi0: bus -1 (PBR8)
    acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
    acpicmos0 at acpi0
    com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
    com0: console
    com1 at acpi0 COM2 addr 0x2f8/0x8 irq 3: ns16550a, 16 byte fifo
    amdgpio0 at acpi0 GPIO uid 0 addr 0xfed81500/0x300 irq 7, 184 pins
    "PRP0001" at acpi0 not configured
    "PRP0001" at acpi0 not configured
    "PRP0001" at acpi0 not configured
    "PRP0001" at acpi0 not configured
    "PRP0001" at acpi0 not configured
    "PRP0001" at acpi0 not configured
    "BOOT0000" at acpi0 not configured
    acpicpu0 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
    acpicpu1 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
    acpicpu2 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
    acpicpu3 at acpi0: C2(0@400 io@0x1771), C1(@1 halt!), PSS
    acpitz0 at acpi0: critical temperature is 115 degC
    cpu0: 998 MHz: speeds: 1000 800 600 MHz
    pci0 at mainbus0 bus 0
    pchb0 at pci0 dev 0 function 0 "AMD 16h Root Complex" rev 0x00
    vendor "AMD", unknown product 0x1567 (class system subclass IOMMU, rev 0x00) at pci0 dev 0 function 2 not configured
    pchb1 at pci0 dev 2 function 0 "AMD 16h Host" rev 0x00
    ppb0 at pci0 dev 2 function 1 "AMD 16h PCIE" rev 0x00: msi
    pci1 at ppb0 bus 1
    em0 at pci1 dev 0 function 0 "Intel I210 Fiber" rev 0x03: msi, address 00:0d:b9:63:2a:e4
    ppb1 at pci0 dev 2 function 2 "AMD 16h PCIE" rev 0x00: msi
    pci2 at ppb1 bus 2
    em1 at pci2 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:63:2a:e5
    ppb2 at pci0 dev 2 function 3 "AMD 16h PCIE" rev 0x00: msi
    pci3 at ppb2 bus 3
    em2 at pci3 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:63:2a:e6
    ppb3 at pci0 dev 2 function 4 "AMD 16h PCIE" rev 0x00: msi
    pci4 at ppb3 bus 4
    em3 at pci4 dev 0 function 0 "Intel I211" rev 0x03: msi, address 00:0d:b9:63:2a:e7
    ccp0 at pci0 dev 8 function 0 "AMD 16h Crypto" rev 0x00
    xhci0 at pci0 dev 16 function 0 "AMD Bolton xHCI" rev 0x11: msix, xHCI 1.0
    xhci0: halt timeout
    xhci0: reset timeout
    xhci0: init failed, error=5
    ahci0 at pci0 dev 17 function 0 "AMD Hudson-2 SATA" rev 0x40: apic 4 int 19, AHCI 1.3
    ahci0: port 0: 6.0Gb/s
    scsibus1 at ahci0: 32 targets
    sd0 at scsibus1 targ 0 lun 0: <ATA, SATA SSD, SBFM> t10.ATA_SATA_SSD_800B072C1B0800194225
    sd0: 28626MB, 512 bytes/sector, 58626288 sectors, thin
    ehci0 at pci0 dev 19 function 0 "AMD Hudson-2 USB2" rev 0x39: apic 4 int 18
    usb0 at ehci0: USB revision 2.0
    uhub0 at usb0 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00 addr 1
    piixpm0 at pci0 dev 20 function 0 "AMD Hudson-2 SMBus" rev 0x42: SMI
    iic0 at piixpm0
    iic1 at piixpm0
    iic1: addr 0x4c 3e=00 48=00 4a=00 4e=00 fc=00 fe=00 words 00=ffff 01=ffff 02=ffff 03=ffff 04=ffff 05=ffff 06=ffff 07=ffff
    pcib0 at pci0 dev 20 function 3 "AMD Hudson-2 LPC" rev 0x11
    sdhc0 at pci0 dev 20 function 7 "AMD Bolton SD/MMC" rev 0x01: apic 4 int 16
    sdhc0: SDHC 2.00, 50 MHz base clock
    sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed, dma
    pchb2 at pci0 dev 24 function 0 "AMD 16h Link Cfg" rev 0x00
    pchb3 at pci0 dev 24 function 1 "AMD 16h Address Map" rev 0x00
    pchb4 at pci0 dev 24 function 2 "AMD 16h DRAM Cfg" rev 0x00
    km0 at pci0 dev 24 function 3 "AMD 16h Misc Cfg" rev 0x00
    pchb5 at pci0 dev 24 function 4 "AMD 16h CPU Power" rev 0x00
    pchb6 at pci0 dev 24 function 5 "AMD 16h Misc Cfg" rev 0x00
    isa0 at pcib0
    isadma0 at isa0
    com2 at isa0 port 0x3e8/8 irq 5: ns16550a, 16 byte fifo
    pcppi0 at isa0 port 0x61
    spkr0 at pcppi0
    lpt0 at isa0 port 0x378/4 irq 7
    intr_establish: pic ioapic0 pin 7: can't share type 3 with 2
    wbsio0 at isa0 port 0x2e/2: NCT5104D rev 0x53
    vmm0 at mainbus0: SVM/RVI
    uhub1 at uhub0 port 1 configuration 1 interface 0 "Advanced Micro Devices Hub" rev 2.00/0.18 addr 2
    vscsi0 at root
    scsibus2 at vscsi0: 256 targets
    softraid0 at root
    scsibus3 at softraid0: 256 targets
    root on sd0a (e9070905c03e760b.a) swap on sd0b dump on sd0b
    
    
    
  • Lucas Gabriel Vuotto:

    SoftLRO for ixl(4), bnxt(4) and em(4)

  • Janne Johansson:

    SoftLRO for ixl(4), bnxt(4) and em(4)