Download raw body.
timing lld --threads for fun and profit
On Fri, Nov 08, 2024 at 01:01:50PM +0000, Stuart Henderson wrote:
> On 2024/11/08 13:26, Martin Pieuchot wrote:
> > On 08/11/24(Fri) 12:22, Landry Breuil wrote:
> > > [...]
> > > someone(tm) should look into patching lld to avoid using more than
> > > MAX(ncpu,5) threads ? In the meantime, i'll probably fix the firefox
> > > ports to avoir using MAKE_JOBS for lld but cap it at 5.
> >
> > Recent lld(1) include the following commits which limit the value to
> > 16 by default instead of the number of available CPUs.
> >
> > See the following commits:
> >
> > https://github.com/llvm/llvm-project/commit/a8788de1c3f3c8c3a591bd3aae2acee1b43b229a
> > https://github.com/llvm/llvm-project/commit/da68d2164efcc1f5e57f090e2ae2219056b120a0
> >
> > Robert do you see the same with chromium? Would it make sense to
> > backport these diff with a smaller value for OpenBSD?
> >
>
> Certainly helps for reorder_kernel.
>
> $ sysctl hw.{model,ncpu,version}
> hw.model=12th Gen Intel(R) Core(TM) i5-1245U
> hw.ncpu=12
> hw.version=ThinkPad T14 Gen 3
>
> # \time -l /usr/libexec/reorder_kernel
> 7.30 real 6.86 user 3.92 sys
> 879280 maximum resident set size
> 0 average shared memory size
> 0 average unshared data size
> 0 average unshared stack size
> 147812 minor page faults
> 79421 major page faults
> 0 swaps
> 11833 block input operations
> 15643 block output operations
> 1 messages sent
> 0 messages received
> 45 signals received
> 46512 voluntary context switches
> 7353 involuntary context switches
> # vi Makefile
> [...]
> $ grep ^LINKFL Makefile
> LINKFLAGS= -T ld.script -X --warn-common -nopie -Wl,--threads=5
> LINKFLAGS+= -S
> # \time -l /usr/libexec/reorder_kernel
> 0.41 real 0.26 user 0.09 sys
> 100920 maximum resident set size
> 0 average shared memory size
> 0 average unshared data size
> 0 average unshared stack size
> 17119 minor page faults
> 10 major page faults
> 0 swaps
> 5 block input operations
> 78 block output operations
> 1 messages sent
> 0 messages received
> 28 signals received
> 189 voluntary context switches
> 2 involuntary context switches
>
There's more to this: even specyfying -Wl,--threads=1 gives me a big
speedup. Or even any value. Looking in the log I see:
ld: error: unknown argument '-Wl,--threads=1'
The correct flags is --threads=N
I dont see a lot of difference for the various flag values.
-Otto
timing lld --threads for fun and profit