Index | Thread | Search

From:
Mateusz Guzik <mjguzik@gmail.com>
Subject:
Re: timing lld --threads for fun and profit
To:
Geoff Steckel <gwes@oat.com>
Cc:
tech@openbsd.org
Date:
Thu, 14 Nov 2024 23:26:38 +0100

Download raw body.

Thread
  • Mateusz Guzik:

    timing lld --threads for fun and profit

On Thu, Nov 14, 2024 at 09:42:50PM +0000, Geoff Steckel wrote:
> On 11/14/24 13:05, Mateusz Guzik wrote:
> [snip]
> > On Thu, Nov 14, 2024 at 11:05:16AM +0000, Martin Pieuchot wrote:
> >> On 08/11/24(Fri) 13:01, Stuart Henderson wrote
> >> The one from wget I'm currently playing with is 2 times slower on a
> >> 24CPUs machine with GENERIC.MP than with GENERIC.
> >>
> >> FlameGraph attached corresponds to non-idle time spent in kernel during
> >> the execution of the wget configure script.  ~20% of sys time is spent
> >> in sys_futex/rw_lock contention which correspond lld(1) threads spinning.
> > 16 is still way too high of a count -- vast majority of binaries have
> > little to gain from parallel linking to begin with and lld itself does
> > not scale all the great either. Finally, if there are multiple llds
> > running at the same time (think parallel port build or system build)
> > that's just plain waste.
> >
> > See this for some measurements: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160
> >
> > In FreeBSD I patched it down to 4 threads at most by default. The
> > --threads parameter can be used to override it.
> >
> A possibly related scenario: processes chained by pipes
>   nprocs >= NCPUS: system almost unresponsive, spin time ~= 100%

This is most likely a bug as opposed to "merely" a scalability problem.

>   nprocs == NCPUS - 1: various CPUs up to 30% spin time
>   nprocs <  NCPUS - 1: spin time < 5%
> Thanks to Stuart Henderson for making waterfall measurements & 
> interpreting them.
> 
> While this is an extreme case of contention for a single lock,
> throwing a dart at the problem suggests lock contention in the file
> system causes throughput to top out.
>