From: Otto Moerbeek Subject: Re: timing lld --threads for fun and profit To: Landry Breuil , robert@openbsd.org, tech@openbsd.org Date: Sun, 10 Nov 2024 17:20:04 +0100 On Fri, Nov 08, 2024 at 01:01:50PM +0000, Stuart Henderson wrote: > On 2024/11/08 13:26, Martin Pieuchot wrote: > > On 08/11/24(Fri) 12:22, Landry Breuil wrote: > > > [...] > > > someone(tm) should look into patching lld to avoid using more than > > > MAX(ncpu,5) threads ? In the meantime, i'll probably fix the firefox > > > ports to avoir using MAKE_JOBS for lld but cap it at 5. > > > > Recent lld(1) include the following commits which limit the value to > > 16 by default instead of the number of available CPUs. > > > > See the following commits: > > > > https://github.com/llvm/llvm-project/commit/a8788de1c3f3c8c3a591bd3aae2acee1b43b229a > > https://github.com/llvm/llvm-project/commit/da68d2164efcc1f5e57f090e2ae2219056b120a0 > > > > Robert do you see the same with chromium? Would it make sense to > > backport these diff with a smaller value for OpenBSD? > > > > Certainly helps for reorder_kernel. > > $ sysctl hw.{model,ncpu,version} > hw.model=12th Gen Intel(R) Core(TM) i5-1245U > hw.ncpu=12 > hw.version=ThinkPad T14 Gen 3 > > # \time -l /usr/libexec/reorder_kernel > 7.30 real 6.86 user 3.92 sys > 879280 maximum resident set size > 0 average shared memory size > 0 average unshared data size > 0 average unshared stack size > 147812 minor page faults > 79421 major page faults > 0 swaps > 11833 block input operations > 15643 block output operations > 1 messages sent > 0 messages received > 45 signals received > 46512 voluntary context switches > 7353 involuntary context switches > # vi Makefile > [...] > $ grep ^LINKFL Makefile > LINKFLAGS= -T ld.script -X --warn-common -nopie -Wl,--threads=5 > LINKFLAGS+= -S > # \time -l /usr/libexec/reorder_kernel > 0.41 real 0.26 user 0.09 sys > 100920 maximum resident set size > 0 average shared memory size > 0 average unshared data size > 0 average unshared stack size > 17119 minor page faults > 10 major page faults > 0 swaps > 5 block input operations > 78 block output operations > 1 messages sent > 0 messages received > 28 signals received > 189 voluntary context switches > 2 involuntary context switches > There's more to this: even specyfying -Wl,--threads=1 gives me a big speedup. Or even any value. Looking in the log I see: ld: error: unknown argument '-Wl,--threads=1' The correct flags is --threads=N I dont see a lot of difference for the various flag values. -Otto