Download raw body.
timing lld --threads for fun and profit
On Thu, Nov 14, 2024 at 09:42:50PM +0000, Geoff Steckel wrote: > On 11/14/24 13:05, Mateusz Guzik wrote: > [snip] > > On Thu, Nov 14, 2024 at 11:05:16AM +0000, Martin Pieuchot wrote: > >> On 08/11/24(Fri) 13:01, Stuart Henderson wrote > >> The one from wget I'm currently playing with is 2 times slower on a > >> 24CPUs machine with GENERIC.MP than with GENERIC. > >> > >> FlameGraph attached corresponds to non-idle time spent in kernel during > >> the execution of the wget configure script. ~20% of sys time is spent > >> in sys_futex/rw_lock contention which correspond lld(1) threads spinning. > > 16 is still way too high of a count -- vast majority of binaries have > > little to gain from parallel linking to begin with and lld itself does > > not scale all the great either. Finally, if there are multiple llds > > running at the same time (think parallel port build or system build) > > that's just plain waste. > > > > See this for some measurements: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160 > > > > In FreeBSD I patched it down to 4 threads at most by default. The > > --threads parameter can be used to override it. > > > A possibly related scenario: processes chained by pipes > nprocs >= NCPUS: system almost unresponsive, spin time ~= 100% This is most likely a bug as opposed to "merely" a scalability problem. > nprocs == NCPUS - 1: various CPUs up to 30% spin time > nprocs < NCPUS - 1: spin time < 5% > Thanks to Stuart Henderson for making waterfall measurements & > interpreting them. > > While this is an extreme case of contention for a single lock, > throwing a dart at the problem suggests lock contention in the file > system causes throughput to top out. >
timing lld --threads for fun and profit