Index | Thread | Search

From:
Landry Breuil <landry@openbsd.org>
Subject:
timing lld --threads for fun and profit
To:
tech@openbsd.org
Date:
Fri, 8 Nov 2024 12:22:32 +0100

Download raw body.

Thread
hi,

so www/mozilla-firefox spends an considerable amount of time linking
libxul.so, which is the monster library containing everything.

right now the port uses:
CONFIGURE_ENV += LDFLAGS="-Wl,--threads=${MAKE_JOBS} --ld-path=${WRKDIR}/bin/ld"

which says to lld "use as many threads as cpus you have". mpi@ made me
look at top -H during linking on 8/12/16 cores machines, and we saw that
only 20% of the time spent by all cores was actual "user" time, the rest
of the time was wasted in %sys and %spin.

so i instrumented the ld wrapper to actually time the lld process via a
simple bsd.port.mk patch:
-_LD_PROGRAM = /usr/bin/ld.lld
+_LD_PROGRAM = /usr/bin/time /usr/bin/ld.lld

and ran a firefox build loop, giving from 1 to 8 threads to the lld process.

this gave me those numbers:
- on an amd64 kvm VM with 16 cores on a slow/old hw:
1     7148.53 real      6003.69 user      1037.50 sys
2     4365.36 real      6755.63 user      1665.59 sys
3     3263.71 real      7589.08 user      1613.45 sys
4     2626.83 real      7903.39 user      1841.40 sys
5     2847.14 real      9818.02 user      2420.12 sys
6     2883.57 real     11454.10 user      3338.28 sys
7     2426.03 real     10754.16 user      3358.06 sys
8     2334.40 real     11463.60 user      3788.33 sys

- on the 8-core arm64 x13s:
1     5173.76 real      4538.51 user       613.66 sys
2     2750.27 real      4693.69 user       688.53 sys
3     1768.95 real      4382.10 user       717.15 sys
4     1407.99 real      4566.04 user       759.40 sys
5     1146.45 real      4438.68 user       856.93 sys
6     1152.81 real      4964.03 user      1228.58 sys
7     1466.82 real      6393.65 user      2460.71 sys
8     1235.53 real      6000.14 user      2420.20 sys

- on the 12-core arm64 omnibook X14:
1     3984.44 real      3493.92 user       468.64 sys
2     2119.35 real      3582.04 user       536.62 sys
3     1522.39 real      3660.79 user       663.06 sys
4     1572.70 real      4227.21 user      1279.55 sys
5     1097.43 real      4091.75 user       876.22 sys
6     1085.70 real      4326.69 user      1289.30 sys
7     1369.48 real      4988.69 user      2934.72 sys
8     1222.77 real      5150.22 user      2945.51 sys

from those numbers, apparently the 'sweet spot' for lld is to not use
more than 5 threads, above that there's too much cpu contention and
we're wasting too much time.

someone(tm) should look into patching lld to avoid using more than
MAX(ncpu,5) threads ? In the meantime, i'll probably fix the firefox
ports to avoir using MAKE_JOBS for lld but cap it at 5.

Landry