Index | Thread | Search

From:
Landry Breuil <landry@openbsd.org>
Subject:
Re: timing lld --threads for fun and profit
To:
tech@openbsd.org
Date:
Fri, 8 Nov 2024 12:42:13 +0100

Download raw body.

Thread
Le Fri, Nov 08, 2024 at 12:22:32PM +0100, Landry Breuil a écrit :
> hi,
> 
> so www/mozilla-firefox spends an considerable amount of time linking
> libxul.so, which is the monster library containing everything.
> 
> right now the port uses:
> CONFIGURE_ENV += LDFLAGS="-Wl,--threads=${MAKE_JOBS} --ld-path=${WRKDIR}/bin/ld"
> 
> which says to lld "use as many threads as cpus you have". mpi@ made me
> look at top -H during linking on 8/12/16 cores machines, and we saw that
> only 20% of the time spent by all cores was actual "user" time, the rest
> of the time was wasted in %sys and %spin.
> 
> so i instrumented the ld wrapper to actually time the lld process via a
> simple bsd.port.mk patch:
> -_LD_PROGRAM = /usr/bin/ld.lld
> +_LD_PROGRAM = /usr/bin/time /usr/bin/ld.lld
> 
> and ran a firefox build loop, giving from 1 to 8 threads to the lld process.
> 
> this gave me those numbers:
> - on an amd64 kvm VM with 16 cores on a slow/old hw:
> 1     7148.53 real      6003.69 user      1037.50 sys
> 2     4365.36 real      6755.63 user      1665.59 sys
> 3     3263.71 real      7589.08 user      1613.45 sys
> 4     2626.83 real      7903.39 user      1841.40 sys
> 5     2847.14 real      9818.02 user      2420.12 sys
> 6     2883.57 real     11454.10 user      3338.28 sys
> 7     2426.03 real     10754.16 user      3358.06 sys
> 8     2334.40 real     11463.60 user      3788.33 sys
> 
> - on the 8-core arm64 x13s:
> 1     5173.76 real      4538.51 user       613.66 sys
> 2     2750.27 real      4693.69 user       688.53 sys
> 3     1768.95 real      4382.10 user       717.15 sys
> 4     1407.99 real      4566.04 user       759.40 sys
> 5     1146.45 real      4438.68 user       856.93 sys
> 6     1152.81 real      4964.03 user      1228.58 sys
> 7     1466.82 real      6393.65 user      2460.71 sys
> 8     1235.53 real      6000.14 user      2420.20 sys
> 
> - on the 12-core arm64 omnibook X14:
> 1     3984.44 real      3493.92 user       468.64 sys
> 2     2119.35 real      3582.04 user       536.62 sys
> 3     1522.39 real      3660.79 user       663.06 sys
> 4     1572.70 real      4227.21 user      1279.55 sys
> 5     1097.43 real      4091.75 user       876.22 sys
> 6     1085.70 real      4326.69 user      1289.30 sys
> 7     1369.48 real      4988.69 user      2934.72 sys
> 8     1222.77 real      5150.22 user      2945.51 sys
> 
> from those numbers, apparently the 'sweet spot' for lld is to not use
> more than 5 threads, above that there's too much cpu contention and
> we're wasting too much time.
> 
> someone(tm) should look into patching lld to avoid using more than
> MAX(ncpu,5) threads ? In the meantime, i'll probably fix the firefox
> ports to avoir using MAKE_JOBS for lld but cap it at 5.

Index: Makefile
===================================================================
RCS file: /cvs/ports/www/mozilla-firefox/Makefile,v
diff -u -p -r1.610 Makefile
--- Makefile    5 Nov 2024 09:10:37 -0000       1.610
+++ Makefile    8 Nov 2024 11:39:57 -0000
@@ -51,7 +50,12 @@ CONFIGURE_SCRIPT =   ${MODPY_BIN} ${WRKSRC
 CONFIGURE_ARGS +=      --prefix=${PREFIX}
 MAKE_ENV +=            BUILD_VERBOSE_LOG="1" CARGOFLAGS="-j${MAKE_JOBS}"
 CONFIGURE_ENV +=       CPPFLAGS=-Wno-backend-plugin
+NCPU !!=               sysctl -n hw.ncpuonline
+.if ${NCPU} > 4
+CONFIGURE_ENV +=       LDFLAGS="-Wl,--threads=5 --ld-path=${WRKDIR}/bin/ld"
+.else
 CONFIGURE_ENV +=       LDFLAGS="-Wl,--threads=${MAKE_JOBS} --ld-path=${WRKDIR}/bin/ld"
+.endif

is a bit gross since i can't reuse MAKE_JOBS that is only defined after
bsd.port.mk inclusion, but that's what i plan to commit to the mozilla
ports.

Landry