Download raw body.
timing lld --threads for fun and profit
On 11/14/24 13:05, Mateusz Guzik wrote: [snip] > On Thu, Nov 14, 2024 at 11:05:16AM +0000, Martin Pieuchot wrote: >> On 08/11/24(Fri) 13:01, Stuart Henderson wrote >> The one from wget I'm currently playing with is 2 times slower on a >> 24CPUs machine with GENERIC.MP than with GENERIC. >> >> FlameGraph attached corresponds to non-idle time spent in kernel during >> the execution of the wget configure script. ~20% of sys time is spent >> in sys_futex/rw_lock contention which correspond lld(1) threads spinning. > 16 is still way too high of a count -- vast majority of binaries have > little to gain from parallel linking to begin with and lld itself does > not scale all the great either. Finally, if there are multiple llds > running at the same time (think parallel port build or system build) > that's just plain waste. > > See this for some measurements: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160 > > In FreeBSD I patched it down to 4 threads at most by default. The > --threads parameter can be used to override it. > A possibly related scenario: processes chained by pipes nprocs >= NCPUS: system almost unresponsive, spin time ~= 100% nprocs == NCPUS - 1: various CPUs up to 30% spin time nprocs < NCPUS - 1: spin time < 5% Thanks to Stuart Henderson for making waterfall measurements & interpreting them. While this is an extreme case of contention for a single lock, throwing a dart at the problem suggests lock contention in the file system causes throughput to top out. geoff steckel
timing lld --threads for fun and profit