From: Andy Lemin Subject: Re: UDP parallel input To: Hrvoje Popovski , alexander.bluhm@gmx.net Cc: tech@openbsd.org Date: Fri, 26 Jul 2024 23:02:23 +1000 Hi, Ahhh, Yes makes total sense why you excluded mcx. Thanks for sharing the reasoning. I have also noticed the Mellanox driver does not print the queue number in dmesg as well. But I knew it does. Thanks for the suggestion Hrvoje, I never knew you could do that! I have never needed to increase the queue count as only run CPUs with the fastest possible clock speeds for OpenBSD (for reasons everyone here already knows), which means just 4 cores.. I can’t test current yet as the only boxes I have Mellanox cards in are production. I will test current using vmx in the meantime.. Honestly, really so cool to see this work! You guys are going to have to arrange a party when Parallel TCP Input drops ;) It may have taken a while, but it’s impressive the team achieved this without ever once taking a shortcut.. Andy Lemin > On 26 Jul 2024, at 22:07, Hrvoje Popovski wrote: > On 26.7.2024. 13:17, Alexander Bluhm wrote: >> On Fri, Jul 26, 2024 at 05:25:13PM +1000, Andrew Lemin wrote: >>> Just to clarify, you mention "you need multiple CPUs and network interfaces >>> that support multiqueue". >>> Mellanox cards have multiple queues; >>> https://github.com/openbsd/src/blob/master/sys/dev/pci/if_mcx.c >> >> Driver looks like mcx(4) supports it. But I am conservative in my >> statements. Other drivers write the number of queues in dmesg. >> >> bnxt0 at pci4 dev 0 function 0 "Broadcom BCM57412" rev 0x01: fw ver 214.4.91, msix, 8 queues, address 14:23:f2:a0:71:e0 >> igc0 at pci11 dev 0 function 0 "Intel I225-LM" rev 0x03, msix, 4 queues, address 24:5e:be:54:8a:41 >> ix0 at pci6 dev 0 function 0 "Intel 82599" rev 0x01, msix, 8 queues, address 90:e2:ba:d6:23:68 >> ixl0 at pci5 dev 0 function 0 "Intel X710 SFP+" rev 0x02: port 0, FW 6.0.48442 API 1.7, msix, 8 queues, address 40:a6:b7:6e:ad:a0 >> vmx0 at pci11 dev 0 function 0 "VMware VMXNET3" rev 0x01: msix, 4 queues, address 00:0c:29:2e:8e:8b >> >> But mcx does not. >> >> mcx0 at pci1 dev 0 function 0 "Mellanox ConnectX-4 Lx" rev 0x00: FW 14.23.1020, msix, address b8:59:9f:0e:57:54 >> >> Unfortunately all my mcx card are in ARM machines, I don't have >> them plugged in my Intel setup. So I don't know if multiqueue does >> work on mcx on arm64, or if they are just missing the print. >> >> Did you test UDP parallel input on mcx(4)? Does it work? Do you >> see performance improvements? >> >> Maybe we could add a queues print for consistency. Maybe I can >> rearrange my hardware setup to do performance tests also for mcx. >> >> bluhm > > Hi, > > mcx can have up to 16 queues depenting on number of CPUs. > If you want to use all 16 queues then in /sys/net/if.c > > > #define NET_TASKQ 4 > change to > #define NET_TASKQ 16 > > > > > smc24# vmstat -iz | grep mcx > irq84/mcx0 15 0 > irq85/mcx0:0 90103730 669 > irq86/mcx0:1 0 0 > irq87/mcx0:2 97510819 724 > irq88/mcx0:3 45074515 334 > irq89/mcx0:4 95193811 706 > irq90/mcx0:5 0 0 > irq91/mcx0:6 0 0 > irq92/mcx0:7 44891670 333 > irq93/mcx0:8 60500115 449 > irq94/mcx0:9 92661318 688 > irq95/mcx0:10 46096426 342 > irq96/mcx0:11 104819744 778 > irq97/mcx0:12 73970115 549 > irq98/mcx0:13 108794487 807 > irq99/mcx0:14 106790266 793 > irq100/mcx0:15 106398156 790 > irq101/mcx1 16 0 > irq102/mcx1:0 109035477 809 > irq103/mcx1:1 89419836 664 > irq104/mcx1:2 106308913 789 > irq105/mcx1:3 97638171 725 > irq106/mcx1:4 106270647 789 > irq107/mcx1:5 0 0 > irq108/mcx1:6 100930228 749 > irq109/mcx1:7 101061351 750 > irq110/mcx1:8 103688001 769 > irq111/mcx1:9 0 0 > irq114/mcx1:10 107928045 801 > irq115/mcx1:11 56470300 419 > irq116/mcx1:12 109033808 809 > irq117/mcx1:13 108898657 808 > irq118/mcx1:14 105792791 785 > irq119/mcx1:15 104902060 778