Index | Thread | Search

From:
Landry Breuil <landry@openbsd.org>
Subject:
Re: kernel rwlocks vs the scheduler
To:
tech@openbsd.org
Date:
Fri, 15 Nov 2024 09:42:58 +0100

Download raw body.

Thread
Le Wed, Nov 13, 2024 at 11:41:10PM +1000, David Gwynne a écrit :
> i've been hitting this diff hard, but would appreciate more tests.
> rwlocks are a very fundamental part of the kernel so they need to
> work. benchmarks and witness tests are also welcome.

some numbers from my favourite benchmark, eg building firefox, this time
on the arm64 omnibook with 12 cores.

without the diff
================
only changing the amount of threads allocated to lld, so MAKE_JOBS=12

1 threads for lld took 6684 (s), time for lld libxul: 3828.96 real 3347.41 user  451.48 sys
2 threads for lld took 4968 (s), time for lld libxul: 2037.03 real 3436.02 user  519.43 sys
3 threads for lld took 4886 (s), time for lld libxul: 2037.39 real 3980.19 user 1336.71 sys
4 threads for lld took 4418 (s), time for lld libxul: 1563.20 real 4209.59 user 1277.40 sys
5 threads for lld took 4632 (s), time for lld libxul: 1620.22 real 4730.67 user 2141.68 sys
6 threads for lld took 4237 (s), time for lld libxul: 1293.13 real 4739.12 user 1854.86 sys
7 threads for lld took 4082 (s), time for lld libxul: 1204.65 real 4979.91 user 2136.25 sys
8 threads for lld took 4154 (s), time for lld libxul: 1251.00 real 5328.76 user 2997.02 sys
all 640m04.18s real  3040m01.82s user  1285m49.42s system

fixed 5 threads for lld, changing MAKE_JOBS, in two batches, first 1 to 8 then 9 to 12:

MAKE_JOBS 1->8
 1043m35.20s real  2507m44.56s user   747m08.12s system
MAKE_JOBS 9-12
  297m41.30s real  1482m53.38s user   584m43.49s system
1  MAKE_JOBS & 5 threads for lld took 17974 (s), time for the whole build:  299m33.63s real   299m25.49s user    63m59.88s system
2  MAKE_JOBS & 5 threads for lld took 9751 (s), time for the whole build:   162m31.17s real   296m23.57s user    60m19.34s system
3  MAKE_JOBS & 5 threads for lld took 7218 (s), time for the whole build:   120m17.75s real   298m07.04s user    67m27.90s system
4  MAKE_JOBS & 5 threads for lld took 7196 (s), time for the whole build:   119m56.49s real   317m28.81s user   131m05.36s system
5  MAKE_JOBS & 5 threads for lld took 5756 (s), time for the whole build:    95m55.22s real   314m53.99s user   101m20.27s system
6  MAKE_JOBS & 5 threads for lld took 5348 (s), time for the whole build:    89m07.90s real   320m45.14s user   110m36.11s system
7  MAKE_JOBS & 5 threads for lld took 4533 (s), time for the whole build:    75m32.59s real   324m56.65s user    96m35.54s system
8  MAKE_JOBS & 5 threads for lld took 4486 (s), time for the whole build:    74m45.29s real   335m40.38s user   112m00.79s system
9  MAKE_JOBS & 5 threads for lld took 4803 (s), time for the whole build:    80m02.76s real   352m22.68s user   145m18.24s system
10 MAKE_JOBS & 5 threads for lld took 4294 (s), time for the whole build:    71m33.42s real   360m47.93s user   132m11.32s system
11 MAKE_JOBS & 5 threads for lld took 4208 (s), time for the whole build:    70m07.91s real   370m18.35s user   141m51.38s system
12 MAKE_JOBS & 5 threads for lld took 4380 (s), time for the whole build:    72m59.61s real   399m21.73s user   163m25.45s system


with the rwlocks diff & the futex diff:
===============================

changing the amount of threads, MAKE_JOBS=12

1 threads for lld took 6764 (s), time for lld libxul: 3843.77 real 3360.01 user  460.74 sys
2 threads for lld took 4998 (s), time for lld libxul: 2040.58 real 3442.94 user  520.94 sys
3 threads for lld took 4554 (s), time for lld libxul: 1515.08 real 3664.65 user  635.89 sys
4 threads for lld took 4276 (s), time for lld libxul: 1213.37 real 3742.56 user  728.96 sys
5 threads for lld took 4293 (s), time for lld libxul: 1241.74 real 4261.79 user 1146.74 sys
6 threads for lld took 4121 (s), time for lld libxul: 1129.83 real 4689.45 user 1186.21 sys
7 threads for lld took 4101 (s), time for lld libxul: 1085.05 real 4877.47 user 1408.21 sys
8 threads for lld took 4479 (s), time for lld libxul: 1536.75 real 5639.95 user 4266.23 sys
all 632m09.59s real  3164m59.79s user  1278m32.98s system

one can clearly see sys time going down..

fixed 5 threads for lld and varying MAKE_JOBS (sadly it rebooted at the 11th step so had a cold start..)

1  MAKE_JOBS & 5 threads for lld took 19645 (s); time for the whole build:  327m25.37s real   328m36.28s user    66m23.38s system
2  MAKE_JOBS & 5 threads for lld took 10545 (s); time for the whole build:  175m45.04s real   322m25.95s user    62m29.85s system
3  MAKE_JOBS & 5 threads for lld took 8262 (s); time for the whole build:   137m42.58s real   329m12.21s user    91m00.57s system
4  MAKE_JOBS & 5 threads for lld took 6552 (s); time for the whole build:   109m12.23s real   319m55.94s user    84m52.69s system
5  MAKE_JOBS & 5 threads for lld took 5638 (s); time for the whole build:    93m57.18s real   317m59.67s user    90m56.84s system
6  MAKE_JOBS & 5 threads for lld took 4983 (s); time for the whole build:    83m03.73s real   311m47.01s user    93m32.52s system
7  MAKE_JOBS & 5 threads for lld took 4946 (s); time for the whole build:    82m25.85s real   334m05.81s user   107m57.99s system
8  MAKE_JOBS & 5 threads for lld took 4664 (s); time for the whole build:    77m44.18s real   336m03.62s user   122m17.10s system
9  MAKE_JOBS & 5 threads for lld took 4451 (s); time for the whole build:    74m11.22s real   351m13.66s user   122m04.53s system
10 MAKE_JOBS & 5 threads for lld took 4183 (s); time for the whole build:    69m43.65s real   358m26.97s user   130m38.95s system
11 MAKE_JOBS & 5 threads for lld took 4495 (s); time for the whole build:    74m54.52s real   396m28.60s user   153m21.54s system
12 MAKE_JOBS & 5 threads for lld took 4599 (s); time for the whole build:    76m38.98s real   373m50.41s user   170m15.96s system

i cant make much from those numbers, but the machine was only building firefox,
so besides top running in tmux there should have been no variation from
external environment, but sometimes there are inconsistencies i can't explain.

note that all those numbers are 'time make package', in which the make
extract/configure/fake/package steps are 100% linear, so i'll retry to run the
experiment timing only the make build step, which should give 'better' numbers
as that step fully takes advantages of parallelism.

i can also 100% see that it makes no sense to default MAKE_JOBS to ncpuonline,
it's ways better to set PARALLEL_MAKE_JOBS to not more than half of ncpuonline,
at least that's what i ended up with on my build vm.

if one wants to reproduce the thread experiment, here's more or less the script:
cd /usr/ports/www/mozilla-firefox
for N in $(jot 12) ;
        make clean=all
        sed -i -e "s/--threads=./--threads=$N/" Makefile
        touch build-${N}-start
        time make package 2>&1 | tee build-${N}-end
done
make clean=all

and use 'sh -c "time env MAKE_JOBS=$N make package"' to measure build time.

Landry