Index | Thread | Search

From:
Mateusz Guzik <mjguzik@gmail.com>
Subject:
improve spinning in mtx_enter
To:
tech@openbsd.org
Date:
Wed, 20 Mar 2024 13:43:18 +0100

Download raw body.

Thread
Hello,

few years back I noted that migration from MD mutex code to MI
implementation happened to regress performance at least on amd64.

The MI implementation fails to check if the lock is free before issuing
CAS, and only spins once before attempts. This avoidably reduces
performance.

While more can be done here, bare minimun the code needs to NULL check,
which I implemented below.

Results on 7.5 (taken from snapshots) timing make -ss -j 16 in the
kernel dir:

before: 521.37s user 524.69s system 1080% cpu 1:36.79 total
after:  522.76s user 486.87s system 1088% cpu 1:32.79 total

That is about 4% reduction in total real time for a rather trivial
change.

diff --git a/sys/kern/kern_lock.c b/sys/kern/kern_lock.c
index b21e1aa5542..6a185b979fa 100644
--- a/sys/kern/kern_lock.c
+++ b/sys/kern/kern_lock.c
@@ -263,16 +263,22 @@ mtx_enter(struct mutex *mtx)
 	    LOP_EXCLUSIVE | LOP_NEWORDER, NULL);
 
 	spc->spc_spinning++;
-	while (mtx_enter_try(mtx) == 0) {
-		CPU_BUSY_CYCLE();
+	for (;;) {
+		if (mtx_enter_try(mtx) != 0)
+			break;
 
+		for (;;) {
+			CPU_BUSY_CYCLE();
 #ifdef MP_LOCKDEBUG
-		if (--nticks == 0) {
-			db_printf("%s: %p lock spun out\n", __func__, mtx);
-			db_enter();
-			nticks = __mp_lock_spinout;
-		}
+			if (--nticks == 0) {
+				db_printf("%s: %p lock spun out\n", __func__, mtx);
+				db_enter();
+				nticks = __mp_lock_spinout;
+			}
 #endif
+			if (mtx->mtx_owner == NULL)
+				break;
+		}
 	}
 	spc->spc_spinning--;
 }