From: Landry Breuil Subject: Re: timing lld --threads for fun and profit To: tech@openbsd.org Date: Fri, 8 Nov 2024 12:42:13 +0100 Le Fri, Nov 08, 2024 at 12:22:32PM +0100, Landry Breuil a écrit : > hi, > > so www/mozilla-firefox spends an considerable amount of time linking > libxul.so, which is the monster library containing everything. > > right now the port uses: > CONFIGURE_ENV += LDFLAGS="-Wl,--threads=${MAKE_JOBS} --ld-path=${WRKDIR}/bin/ld" > > which says to lld "use as many threads as cpus you have". mpi@ made me > look at top -H during linking on 8/12/16 cores machines, and we saw that > only 20% of the time spent by all cores was actual "user" time, the rest > of the time was wasted in %sys and %spin. > > so i instrumented the ld wrapper to actually time the lld process via a > simple bsd.port.mk patch: > -_LD_PROGRAM = /usr/bin/ld.lld > +_LD_PROGRAM = /usr/bin/time /usr/bin/ld.lld > > and ran a firefox build loop, giving from 1 to 8 threads to the lld process. > > this gave me those numbers: > - on an amd64 kvm VM with 16 cores on a slow/old hw: > 1 7148.53 real 6003.69 user 1037.50 sys > 2 4365.36 real 6755.63 user 1665.59 sys > 3 3263.71 real 7589.08 user 1613.45 sys > 4 2626.83 real 7903.39 user 1841.40 sys > 5 2847.14 real 9818.02 user 2420.12 sys > 6 2883.57 real 11454.10 user 3338.28 sys > 7 2426.03 real 10754.16 user 3358.06 sys > 8 2334.40 real 11463.60 user 3788.33 sys > > - on the 8-core arm64 x13s: > 1 5173.76 real 4538.51 user 613.66 sys > 2 2750.27 real 4693.69 user 688.53 sys > 3 1768.95 real 4382.10 user 717.15 sys > 4 1407.99 real 4566.04 user 759.40 sys > 5 1146.45 real 4438.68 user 856.93 sys > 6 1152.81 real 4964.03 user 1228.58 sys > 7 1466.82 real 6393.65 user 2460.71 sys > 8 1235.53 real 6000.14 user 2420.20 sys > > - on the 12-core arm64 omnibook X14: > 1 3984.44 real 3493.92 user 468.64 sys > 2 2119.35 real 3582.04 user 536.62 sys > 3 1522.39 real 3660.79 user 663.06 sys > 4 1572.70 real 4227.21 user 1279.55 sys > 5 1097.43 real 4091.75 user 876.22 sys > 6 1085.70 real 4326.69 user 1289.30 sys > 7 1369.48 real 4988.69 user 2934.72 sys > 8 1222.77 real 5150.22 user 2945.51 sys > > from those numbers, apparently the 'sweet spot' for lld is to not use > more than 5 threads, above that there's too much cpu contention and > we're wasting too much time. > > someone(tm) should look into patching lld to avoid using more than > MAX(ncpu,5) threads ? In the meantime, i'll probably fix the firefox > ports to avoir using MAKE_JOBS for lld but cap it at 5. Index: Makefile =================================================================== RCS file: /cvs/ports/www/mozilla-firefox/Makefile,v diff -u -p -r1.610 Makefile --- Makefile 5 Nov 2024 09:10:37 -0000 1.610 +++ Makefile 8 Nov 2024 11:39:57 -0000 @@ -51,7 +50,12 @@ CONFIGURE_SCRIPT = ${MODPY_BIN} ${WRKSRC CONFIGURE_ARGS += --prefix=${PREFIX} MAKE_ENV += BUILD_VERBOSE_LOG="1" CARGOFLAGS="-j${MAKE_JOBS}" CONFIGURE_ENV += CPPFLAGS=-Wno-backend-plugin +NCPU !!= sysctl -n hw.ncpuonline +.if ${NCPU} > 4 +CONFIGURE_ENV += LDFLAGS="-Wl,--threads=5 --ld-path=${WRKDIR}/bin/ld" +.else CONFIGURE_ENV += LDFLAGS="-Wl,--threads=${MAKE_JOBS} --ld-path=${WRKDIR}/bin/ld" +.endif is a bit gross since i can't reuse MAKE_JOBS that is only defined after bsd.port.mk inclusion, but that's what i plan to commit to the mozilla ports. Landry