Index | Thread | Search

From:
Mateusz Guzik <mjguzik@gmail.com>
Subject:
Re: [PATCH] convert mpl ticket lock to anderson's lock
To:
Mateusz Guzik <mjguzik@gmail.com>, tech@openbsd.org
Date:
Tue, 17 Feb 2026 14:50:52 +0100

Download raw body.

Thread
On Tue, Feb 17, 2026 at 11:53 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Tue, Feb 17, 2026 at 9:23 AM Martin Pieuchot <mpi@grenadille.net> wrote:
> >
> > On 27/12/25(Sat) 06:24, Mateusz Guzik wrote:
> > > Example microbenchmark doing fstat in 8 threads (ops/s):
> > > before: 2305501
> > > after: 5891611 (+155%)
> > >
> > > Booted with WITNESS et al, survived several kernel builds no problem.
> > >
> > > Needs more testing of course.
> >
> > Here are some numbers building a kernel and libLLVM on a amd64 i9 w/ 24
> > cores and an arm64 Ampere Altra w/ 80 cores.
> >
> > vanilla - amd64 - kernel -j24
> > =============================
> >     0m47.40s real    10m41.77s user     4m00.12s system         (cold)
> >     0m48.19s real    10m59.92s user     4m00.97s system
> >     0m49.22s real    11m19.23s user     4m06.63s system
> >
> > anderson - amd64 - kernel -j24
> > ==============================
> >     0m48.01s real    10m53.56s user     4m03.85s system         (cold)
> >     0m49.65s real    11m26.71s user     4m07.10s system
> >     0m50.00s real    11m36.01s user     4m07.08s system
> >
> > vanilla - arm64 - kernel -j48
> > =============================
> >     1m10.41s real    14m06.83s user    24m28.19s system         (cold)
> >     1m11.21s real    14m01.66s user    24m37.54s system
> >     1m11.37s real    14m04.17s user    24m33.37s system
> >
> > anderson - arm64 - kernel -j48
> > ==============================
> >     1m11.02s real    13m59.98s user    24m42.46s system         (cold)
> >     1m11.48s real    14m01.83s user    24m43.20s system
> >     1m11.54s real    14m03.91s user    24m41.63s system
> >
> > vanilla - amd64 - libLLVM -j24
> > ==============================
> >    11m51.87s real   240m05.22s user    34m45.38s system         (cold)
> >    11m55.73s real   240m17.56s user    35m47.06s system
> >    12m02.38s real   241m32.83s user    36m37.51s system
> >
> > anderson - amd64 - libLLVM -j24
> > ===============================
> >    11m56.16s real   241m59.39s user    34m52.01s system         (cold)
> >    11m47.98s real   237m43.06s user    35m31.95s system
> >    11m56.02s real   239m36.04s user    36m06.48s system
> >
> > vanilla - arm64 - libLLVM -j48
> > ==============================
> >    18m37.50s real   569m58.77s user   267m49.27s system
> >    18m44.55s real   570m28.07s user   270m17.30s system
> >    18m45.46s real   569m55.88s user   271m13.74s system
> >
> > anderson - arm64 - libLLVM -j48
> > ===============================
> >    18m22.51s real   569m20.67s user   261m34.81s system
> >    18m31.35s real   571m37.11s user   262m50.73s system
> >    18m34.12s real   569m40.94s user   266m25.65s system
> >
> >
>
> No speed ups at all when building at lower scale lines up with my own
> results, which I mentioned here:
> https://marc.info/?l=openbsd-tech&m=176631121132731&w=2
>
> Per that post, the primary problem concerns page allocation and the
> way mutexes are implemented
>
> The small speed up at bigger scale is presumably also only there
> because of the above problem -- if it did not exist, the speed up
> would be bigger.

I guess a case for inclusion can be made by comparing a profile
before/after with -j 48. Given the win, I expect kernel_lock time
dropped significantly but it also increased cache-bouncing overhead
elsewhere, which in turn artificially lowered the win.