Index | Thread | Search

From:
Geoff Steckel <gwes@oat.com>
Subject:
Re: timing lld --threads for fun and profit
To:
tech@openbsd.org
Date:
Thu, 14 Nov 2024 16:42:50 -0500

Download raw body.

Thread
On 11/14/24 13:05, Mateusz Guzik wrote:
[snip]
> On Thu, Nov 14, 2024 at 11:05:16AM +0000, Martin Pieuchot wrote:
>> On 08/11/24(Fri) 13:01, Stuart Henderson wrote
>> The one from wget I'm currently playing with is 2 times slower on a
>> 24CPUs machine with GENERIC.MP than with GENERIC.
>>
>> FlameGraph attached corresponds to non-idle time spent in kernel during
>> the execution of the wget configure script.  ~20% of sys time is spent
>> in sys_futex/rw_lock contention which correspond lld(1) threads spinning.
> 16 is still way too high of a count -- vast majority of binaries have
> little to gain from parallel linking to begin with and lld itself does
> not scale all the great either. Finally, if there are multiple llds
> running at the same time (think parallel port build or system build)
> that's just plain waste.
>
> See this for some measurements: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160
>
> In FreeBSD I patched it down to 4 threads at most by default. The
> --threads parameter can be used to override it.
>
A possibly related scenario: processes chained by pipes
  nprocs >= NCPUS: system almost unresponsive, spin time ~= 100%
  nprocs == NCPUS - 1: various CPUs up to 30% spin time
  nprocs <  NCPUS - 1: spin time < 5%
Thanks to Stuart Henderson for making waterfall measurements & 
interpreting them.

While this is an extreme case of contention for a single lock,
throwing a dart at the problem suggests lock contention in the file
system causes throughput to top out.

geoff steckel