Index | Thread | Search

From:
Alexander Bluhm <alexander.bluhm@gmx.net>
Subject:
Re: improve spinning in mtx_enter
To:
Mateusz Guzik <mjguzik@gmail.com>
Cc:
tech@openbsd.org
Date:
Thu, 21 Mar 2024 13:22:48 +0100

Download raw body.

Thread
On Wed, Mar 20, 2024 at 01:43:18PM +0100, Mateusz Guzik wrote:
> few years back I noted that migration from MD mutex code to MI
> implementation happened to regress performance at least on amd64.
> 
> The MI implementation fails to check if the lock is free before issuing
> CAS, and only spins once before attempts. This avoidably reduces
> performance.
> 
> While more can be done here, bare minimun the code needs to NULL check,
> which I implemented below.
> 
> Results on 7.5 (taken from snapshots) timing make -ss -j 16 in the
> kernel dir:
> 
> before: 521.37s user 524.69s system 1080% cpu 1:36.79 total
> after:  522.76s user 486.87s system 1088% cpu 1:32.79 total
> 
> That is about 4% reduction in total real time for a rather trivial
> change.

I refactored the diff a bit to keep the while loop.

With that I get 5% reduction in sys time during kernel build on a
8 CPU machine.  It has two sockets with 4 cores each.

Flame graphs of kernel build are here:
http://bluhm.genua.de/perform/results/2024-03-19T23:41:35Z/2024-03-19T00%3A00%3A00Z/btrace/time_-lp_make_-CGENERIC.MP_-j8_-s-btrace-kstack.0.svg
http://bluhm.genua.de/perform/results/2024-03-19T23:41:35Z/patch-sys-mutex-spin.1/btrace/time_-lp_make_-CGENERIC.MP_-j8_-s-btrace-kstack.0.svg

If you zoom into kernel and search for mtx_enter, you see that
spinning time goes down from 13.1% to 10.7%

Should we commit this?

bluhm

Index: kern/kern_lock.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_lock.c,v
diff -u -p -r1.72 kern_lock.c
--- kern/kern_lock.c	26 Apr 2022 15:31:14 -0000	1.72
+++ kern/kern_lock.c	21 Mar 2024 12:11:20 -0000
@@ -264,15 +264,17 @@ mtx_enter(struct mutex *mtx)
 
 	spc->spc_spinning++;
 	while (mtx_enter_try(mtx) == 0) {
-		CPU_BUSY_CYCLE();
-
+		do {
+			CPU_BUSY_CYCLE();
 #ifdef MP_LOCKDEBUG
-		if (--nticks == 0) {
-			db_printf("%s: %p lock spun out\n", __func__, mtx);
-			db_enter();
-			nticks = __mp_lock_spinout;
-		}
+			if (--nticks == 0) {
+				db_printf("%s: %p lock spun out\n",
+				    __func__, mtx);
+				db_enter();
+				nticks = __mp_lock_spinout;
+			}
 #endif
+		} while (mtx->mtx_owner != NULL);
 	}
 	spc->spc_spinning--;
 }