Download raw body.
timing lld --threads for fun and profit
Le Fri, Nov 08, 2024 at 12:22:32PM +0100, Landry Breuil a écrit :
> hi,
>
> so www/mozilla-firefox spends an considerable amount of time linking
> libxul.so, which is the monster library containing everything.
>
> right now the port uses:
> CONFIGURE_ENV += LDFLAGS="-Wl,--threads=${MAKE_JOBS} --ld-path=${WRKDIR}/bin/ld"
>
> which says to lld "use as many threads as cpus you have". mpi@ made me
> look at top -H during linking on 8/12/16 cores machines, and we saw that
> only 20% of the time spent by all cores was actual "user" time, the rest
> of the time was wasted in %sys and %spin.
>
> so i instrumented the ld wrapper to actually time the lld process via a
> simple bsd.port.mk patch:
> -_LD_PROGRAM = /usr/bin/ld.lld
> +_LD_PROGRAM = /usr/bin/time /usr/bin/ld.lld
>
> and ran a firefox build loop, giving from 1 to 8 threads to the lld process.
>
> this gave me those numbers:
> - on an amd64 kvm VM with 16 cores on a slow/old hw:
> 1 7148.53 real 6003.69 user 1037.50 sys
> 2 4365.36 real 6755.63 user 1665.59 sys
> 3 3263.71 real 7589.08 user 1613.45 sys
> 4 2626.83 real 7903.39 user 1841.40 sys
> 5 2847.14 real 9818.02 user 2420.12 sys
> 6 2883.57 real 11454.10 user 3338.28 sys
> 7 2426.03 real 10754.16 user 3358.06 sys
> 8 2334.40 real 11463.60 user 3788.33 sys
>
> - on the 8-core arm64 x13s:
> 1 5173.76 real 4538.51 user 613.66 sys
> 2 2750.27 real 4693.69 user 688.53 sys
> 3 1768.95 real 4382.10 user 717.15 sys
> 4 1407.99 real 4566.04 user 759.40 sys
> 5 1146.45 real 4438.68 user 856.93 sys
> 6 1152.81 real 4964.03 user 1228.58 sys
> 7 1466.82 real 6393.65 user 2460.71 sys
> 8 1235.53 real 6000.14 user 2420.20 sys
>
> - on the 12-core arm64 omnibook X14:
> 1 3984.44 real 3493.92 user 468.64 sys
> 2 2119.35 real 3582.04 user 536.62 sys
> 3 1522.39 real 3660.79 user 663.06 sys
> 4 1572.70 real 4227.21 user 1279.55 sys
> 5 1097.43 real 4091.75 user 876.22 sys
> 6 1085.70 real 4326.69 user 1289.30 sys
> 7 1369.48 real 4988.69 user 2934.72 sys
> 8 1222.77 real 5150.22 user 2945.51 sys
>
> from those numbers, apparently the 'sweet spot' for lld is to not use
> more than 5 threads, above that there's too much cpu contention and
> we're wasting too much time.
>
> someone(tm) should look into patching lld to avoid using more than
> MAX(ncpu,5) threads ? In the meantime, i'll probably fix the firefox
> ports to avoir using MAKE_JOBS for lld but cap it at 5.
Index: Makefile
===================================================================
RCS file: /cvs/ports/www/mozilla-firefox/Makefile,v
diff -u -p -r1.610 Makefile
--- Makefile 5 Nov 2024 09:10:37 -0000 1.610
+++ Makefile 8 Nov 2024 11:39:57 -0000
@@ -51,7 +50,12 @@ CONFIGURE_SCRIPT = ${MODPY_BIN} ${WRKSRC
CONFIGURE_ARGS += --prefix=${PREFIX}
MAKE_ENV += BUILD_VERBOSE_LOG="1" CARGOFLAGS="-j${MAKE_JOBS}"
CONFIGURE_ENV += CPPFLAGS=-Wno-backend-plugin
+NCPU !!= sysctl -n hw.ncpuonline
+.if ${NCPU} > 4
+CONFIGURE_ENV += LDFLAGS="-Wl,--threads=5 --ld-path=${WRKDIR}/bin/ld"
+.else
CONFIGURE_ENV += LDFLAGS="-Wl,--threads=${MAKE_JOBS} --ld-path=${WRKDIR}/bin/ld"
+.endif
is a bit gross since i can't reuse MAKE_JOBS that is only defined after
bsd.port.mk inclusion, but that's what i plan to commit to the mozilla
ports.
Landry
timing lld --threads for fun and profit