Download raw body.
[PATCH] amd64: import optimized memcmp from FreeBSD
Previously I used a microbenchmark which performs open + close, so
that's 2 syscalls. I was too lazy to add stat into it, which I'm
remedying now.
On a stock kernel (my memcmp change reverted) I'm seeing about 650k
stats/s. In contrast Linux on the same hardware performs over 4 mln
(or over 6 x the OpenBSD rate). This is with mitigations and whatnot
enabled and I happen to know there is at least 10% lost due to plain
inefficiencies in their case. Note I'm not trying to compare
scalability here, merely single-threaded performance.
I obtained a profile while running the bench and did some post processing.
Top funcs are:
13 copyout
14 memset
14 rw_do_enter_write
15 vget
15 Xsyscall_meltdown
17 splraise
29 SipHash_End
31 userland
62 mtx_enter
69 memcpy
74 memcmp
164 spllower
the number indicates how many samples it got, so it is relative
compared to the rest.
The total sum is 739. Meaning memcmp alone accounts for 10% of cpu time.
Also note majority of the problem executes with the kernel lock held,
further exacerbating the scalability issue.
So I'm confident doing a pass which sorts this all out helps more than
just microbenchmarks.
btw how do you generate a flamegraph from these? the output seems
compatible with bpftrace, but piping stackcollapse-bpftrace.pl to
flamegraph.pl gives me an empty file
how to:
btrace -e 'profile:hz:99 { @[kstack] = count(); }' > stat # leave it
running for an arbitrary time
# whack idle from the output
# utterly garbage one liner for post processing, don't judge me
cat stat | perl ~/repos/FlameGraph/stackcollapse-bpftrace.pl | sed
's/.*;//' | awk '{ print $2 " " $1 }' | sed 's/\+.*//' | awk '{ s[$2]
+= $1 } END { for (f in s) print s[f] " " f }' | sort -n
# same but get the total
cat stat | perl ~/repos/FlameGraph/stackcollapse-bpftrace.pl | sed
's/.*;//' | awk '{ print $2 " " $1 }' | sed 's/\+.*//' | awk '{ s[$2]
+= $1 } END { for (f in s) print s[f] " " f }' | sort -n | awk '{ sum
+= $1 } END { print sum }
--
Mateusz Guzik <mjguzik gmail.com>
[PATCH] amd64: import optimized memcmp from FreeBSD