From: Yuichiro NAITO Subject: iavf patch [4/4]: TX queue direct dispatch To: tech@openbsd.org Date: Fri, 07 Feb 2025 16:26:34 +0900 After applying previous 3 patches, while I'm testing packet forwarding via iavf interfaces, packet transfer performance is unstable. In the worst case, ipgen (*1) results are shown as follows. *1: https://github.com/iij/ipgen framesize|0G 1G 2G 3G 4G 5G 6G 7G 8G 9G 10Gbps ---------+----+----+----+----+----+----+----+----+----+----+ 64 |## 297.60Mbps, 581259/14880952pps, 3.91% 128 |### 592.77Mbps, 578873/ 8445945pps, 6.85% 512 |############# 2406.01Mbps, 587406/ 2349624pps, 25.00% 1024 |####################### 4597.74Mbps, 561247/ 1197318pps, 46.88% 1280 |############################# 5764.50Mbps, 562939/ 961538pps, 58.55% 1408 |############################### 6181.42Mbps, 548777/ 875350pps, 62.69% 1518 |################################## 6785.59Mbps, 558761/ 812743pps, 68.75% The best case results are following. mesize|0G 1G 2G 3G 4G 5G 6G 7G 8G 9G 10Gbps ---------+----+----+----+----+----+----+----+----+----+----+ 64 |### 520.83Mbps, 1017253/14880952pps, 6.84% 128 |##### 945.95Mbps, 923776/ 8445945pps, 10.94% 512 |###################### 4325.96Mbps, 1056142/ 2349624pps, 44.95% 1024 |######################################### 8059.75Mbps, 983856/ 1197318pps, 82.17% 1280 |################################################## 9844.81Mbps, 961407/ 961538pps, 99.99% 1408 |############################################ 8625.06Mbps, 765719/ 875350pps, 87.48% 1518 |############################################### 9253.08Mbps, 761947/ 812743pps, 93.75% These 2 cases have the same conditions. The same machine, same nic, same kernel, just tested again. While the testing, I see many IPIs from the `systat vm` result as follows. ``` 1 users Load 0.75 0.22 0.08 openiavf.yuisoft.co 15:45:33 memory totals (in KB) PAGING SWAPPING Interrupts real virtual free in out in out 186338 total Active 34600 34600 7584080 ops com0 All 505348 505348 16489452 pages mpi0 uhci0 Proc:r d s w Csw Trp Sys Int Sof Flt forks iavf0 2 54 330462 147 20281 101 69 fkppw 16218 iavf0:0 fksvm iavf0:1 5.0%Int 5.0%Spn 24.5%Sys 0.0%Usr 65.5%Idle pwait iavf0:3 | | | | | | | | | | | relck iavf1 |||@@============ rlkok iavf1:0 noram 4063 iavf1:2 Namei Sys-cache Proc-cache No-cache ndcpy iavf1:3 Calls hits % hits % miss % fltcp 3 vmx0:0 zfod 1 vmx0:1 cow vmx0:2 Disks sd0 cd0 67411 fmin 3 vmx0:3 seeks 89881 ftarg 523 clock xfers itarg 165527 ipi speed 2 wired sec pdfre pdscn pzidl 502484 IPKTS 12 kmape 502499 OPKTS ``` I'm running OpenBSD current on an ESXi virtual machine, so the IPI is used for waking up an idle CPU. In a network driver, RX and TX queue processing runs on a softnet taskque. I checked how many packets are processed on these queues by adding TRACEPOINT in ifq_start and ifiq_process functions. The ifq_start function triggers TX taskque and the ifiq_process function is called in the RX taskque. diff --git a/sys/net/ifq.c b/sys/net/ifq.c index 3c3b141fb58..84913242965 100644 @@ -121,6 +122,8 @@ ifq_serialize(struct ifqueue *ifq, struct task *t) void ifq_start(struct ifqueue *ifq) { + TRACEPOINT(ifq, start, ifq_len(ifq)); + if (ifq_len(ifq) >= min(ifq->ifq_if->if_txmit, ifq->ifq_maxlen)) { task_del(ifq->ifq_softnet, &ifq->ifq_bundle); ifq_run_start(ifq); @@ -862,6 +865,8 @@ ifiq_process(void *arg) ml_init(&ifiq->ifiq_ml); mtx_leave(&ifiq->ifiq_mtx); + TRACEPOINT(ifiq, process, ml_len(&ml)); + if_input_process(ifiq->ifiq_if, &ml); } The btrace script is shown as follows. ``` tracepoint:ifq:start { @start = lhist(arg0, 0, 10, 1); } tracepoint:ifiq:process { @process = lhist(arg0, 0, 100, 10); } ``` The btrace results are shown as follows. ``` @start: [1, 2) 233557 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @process: [0, 10) 5341 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [10, 20) 1316 |@@@@@@@@@@@@ | [20, 30) 893 |@@@@@@@@ | [30, 40) 554 |@@@@@ | [40, 50) 717 |@@@@@@ | [50, 60) 83 | | [60, 70) 23 | | [70, 80) 168 |@ | [80, 90) 112 |@ | [90, 100) 1004 |@@@@@@@@@ | ``` This means that a TX taskque processes only 1 packet at once. If the ifq_len returns 1, the TX taskqeue is always kicked by the following code and an idle CPU will often be waken up. ``` void ifq_start(struct ifqueue *ifq) { if (ifq_len(ifq) >= min(ifq->ifq_if->if_txmit, ifq->ifq_maxlen)) { task_del(ifq->ifq_softnet, &ifq->ifq_bundle); ifq_run_start(ifq); } else task_add(ifq->ifq_softnet, &ifq->ifq_bundle); } ``` If ifq_len returns bigger or equal value than if_txmit, TX taskq isn't kicked and ifq_run_start is dispatched directly. So, I set if_txmit = 1 in the iavf driver. The packet forwarding performance gets stable. The average performance is around 800k pps. framesize|0G 1G 2G 3G 4G 5G 6G 7G 8G 9G 10Gbps ---------+----+----+----+----+----+----+----+----+----+----+ 64 |### 440.85Mbps, 861035/14880952pps, 5.79% 128 |##### 883.72Mbps, 863011/ 8445945pps, 10.22% 512 |################# 3308.29Mbps, 807687/ 2349624pps, 34.38% 1024 |################################### 6895.36Mbps, 841719/ 1197318pps, 70.30% 1280 |############################################ 8623.26Mbps, 842115/ 961538pps, 87.58% 1408 |################################################# 9619.30Mbps, 853986/ 875350pps, 97.56% 1518 |############################################ 8636.21Mbps, 711150/ 812743pps, 87.50% OK? diff --git a/sys/dev/pci/if_iavf.c b/sys/dev/pci/if_iavf.c index 204dbfc2637..bcf345de9ec 100644 --- a/sys/dev/pci/if_iavf.c +++ b/sys/dev/pci/if_iavf.c @@ -1052,6 +1052,7 @@ iavf_attach(struct device *parent, struct device *self, void *aux) ifp->if_softc = sc; ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST; ifp->if_xflags = IFXF_MPSAFE; + ifp->if_txmit = 1; ifp->if_ioctl = iavf_ioctl; ifp->if_qstart = iavf_start; ifp->if_watchdog = iavf_watchdog; -- Yuichiro NAITO (naito.yuichiro@gmail.com)