From: Mateusz Guzik Subject: Re: [PATCH] amd64: import optimized memcmp from FreeBSD To: Mark Kettenis Cc: tech@openbsd.org Date: Sat, 30 Nov 2024 06:48:00 +0100 Previously I used a microbenchmark which performs open + close, so that's 2 syscalls. I was too lazy to add stat into it, which I'm remedying now. On a stock kernel (my memcmp change reverted) I'm seeing about 650k stats/s. In contrast Linux on the same hardware performs over 4 mln (or over 6 x the OpenBSD rate). This is with mitigations and whatnot enabled and I happen to know there is at least 10% lost due to plain inefficiencies in their case. Note I'm not trying to compare scalability here, merely single-threaded performance. I obtained a profile while running the bench and did some post processing. Top funcs are: 13 copyout 14 memset 14 rw_do_enter_write 15 vget 15 Xsyscall_meltdown 17 splraise 29 SipHash_End 31 userland 62 mtx_enter 69 memcpy 74 memcmp 164 spllower the number indicates how many samples it got, so it is relative compared to the rest. The total sum is 739. Meaning memcmp alone accounts for 10% of cpu time. Also note majority of the problem executes with the kernel lock held, further exacerbating the scalability issue. So I'm confident doing a pass which sorts this all out helps more than just microbenchmarks. btw how do you generate a flamegraph from these? the output seems compatible with bpftrace, but piping stackcollapse-bpftrace.pl to flamegraph.pl gives me an empty file how to: btrace -e 'profile:hz:99 { @[kstack] = count(); }' > stat # leave it running for an arbitrary time # whack idle from the output # utterly garbage one liner for post processing, don't judge me cat stat | perl ~/repos/FlameGraph/stackcollapse-bpftrace.pl | sed 's/.*;//' | awk '{ print $2 " " $1 }' | sed 's/\+.*//' | awk '{ s[$2] += $1 } END { for (f in s) print s[f] " " f }' | sort -n # same but get the total cat stat | perl ~/repos/FlameGraph/stackcollapse-bpftrace.pl | sed 's/.*;//' | awk '{ print $2 " " $1 }' | sed 's/\+.*//' | awk '{ s[$2] += $1 } END { for (f in s) print s[f] " " f }' | sort -n | awk '{ sum += $1 } END { print sum } -- Mateusz Guzik