Index | Thread | Search

From:
Mateusz Guzik <mjguzik@gmail.com>
Subject:
Re: [PATCH] convert mpl ticket lock to anderson's lock
To:
Mateusz Guzik <mjguzik@gmail.com>, tech@openbsd.org
Date:
Tue, 17 Feb 2026 11:53:53 +0100

Download raw body.

Thread
On Tue, Feb 17, 2026 at 9:23 AM Martin Pieuchot <mpi@grenadille.net> wrote:
>
> On 27/12/25(Sat) 06:24, Mateusz Guzik wrote:
> > Example microbenchmark doing fstat in 8 threads (ops/s):
> > before: 2305501
> > after: 5891611 (+155%)
> >
> > Booted with WITNESS et al, survived several kernel builds no problem.
> >
> > Needs more testing of course.
>
> Here are some numbers building a kernel and libLLVM on a amd64 i9 w/ 24
> cores and an arm64 Ampere Altra w/ 80 cores.
>
> vanilla - amd64 - kernel -j24
> =============================
>     0m47.40s real    10m41.77s user     4m00.12s system         (cold)
>     0m48.19s real    10m59.92s user     4m00.97s system
>     0m49.22s real    11m19.23s user     4m06.63s system
>
> anderson - amd64 - kernel -j24
> ==============================
>     0m48.01s real    10m53.56s user     4m03.85s system         (cold)
>     0m49.65s real    11m26.71s user     4m07.10s system
>     0m50.00s real    11m36.01s user     4m07.08s system
>
> vanilla - arm64 - kernel -j48
> =============================
>     1m10.41s real    14m06.83s user    24m28.19s system         (cold)
>     1m11.21s real    14m01.66s user    24m37.54s system
>     1m11.37s real    14m04.17s user    24m33.37s system
>
> anderson - arm64 - kernel -j48
> ==============================
>     1m11.02s real    13m59.98s user    24m42.46s system         (cold)
>     1m11.48s real    14m01.83s user    24m43.20s system
>     1m11.54s real    14m03.91s user    24m41.63s system
>
> vanilla - amd64 - libLLVM -j24
> ==============================
>    11m51.87s real   240m05.22s user    34m45.38s system         (cold)
>    11m55.73s real   240m17.56s user    35m47.06s system
>    12m02.38s real   241m32.83s user    36m37.51s system
>
> anderson - amd64 - libLLVM -j24
> ===============================
>    11m56.16s real   241m59.39s user    34m52.01s system         (cold)
>    11m47.98s real   237m43.06s user    35m31.95s system
>    11m56.02s real   239m36.04s user    36m06.48s system
>
> vanilla - arm64 - libLLVM -j48
> ==============================
>    18m37.50s real   569m58.77s user   267m49.27s system
>    18m44.55s real   570m28.07s user   270m17.30s system
>    18m45.46s real   569m55.88s user   271m13.74s system
>
> anderson - arm64 - libLLVM -j48
> ===============================
>    18m22.51s real   569m20.67s user   261m34.81s system
>    18m31.35s real   571m37.11s user   262m50.73s system
>    18m34.12s real   569m40.94s user   266m25.65s system
>
>

No speed ups at all when building at lower scale lines up with my own
results, which I mentioned here:
https://marc.info/?l=openbsd-tech&m=176631121132731&w=2

Per that post, the primary problem concerns page allocation and the
way mutexes are implemented

The small speed up at bigger scale is presumably also only there
because of the above problem -- if it did not exist, the speed up
would be bigger.