From: Mateusz Guzik Subject: improve spinning in mtx_enter To: tech@openbsd.org Date: Wed, 20 Mar 2024 13:43:18 +0100 Hello, few years back I noted that migration from MD mutex code to MI implementation happened to regress performance at least on amd64. The MI implementation fails to check if the lock is free before issuing CAS, and only spins once before attempts. This avoidably reduces performance. While more can be done here, bare minimun the code needs to NULL check, which I implemented below. Results on 7.5 (taken from snapshots) timing make -ss -j 16 in the kernel dir: before: 521.37s user 524.69s system 1080% cpu 1:36.79 total after: 522.76s user 486.87s system 1088% cpu 1:32.79 total That is about 4% reduction in total real time for a rather trivial change. diff --git a/sys/kern/kern_lock.c b/sys/kern/kern_lock.c index b21e1aa5542..6a185b979fa 100644 --- a/sys/kern/kern_lock.c +++ b/sys/kern/kern_lock.c @@ -263,16 +263,22 @@ mtx_enter(struct mutex *mtx) LOP_EXCLUSIVE | LOP_NEWORDER, NULL); spc->spc_spinning++; - while (mtx_enter_try(mtx) == 0) { - CPU_BUSY_CYCLE(); + for (;;) { + if (mtx_enter_try(mtx) != 0) + break; + for (;;) { + CPU_BUSY_CYCLE(); #ifdef MP_LOCKDEBUG - if (--nticks == 0) { - db_printf("%s: %p lock spun out\n", __func__, mtx); - db_enter(); - nticks = __mp_lock_spinout; - } + if (--nticks == 0) { + db_printf("%s: %p lock spun out\n", __func__, mtx); + db_enter(); + nticks = __mp_lock_spinout; + } #endif + if (mtx->mtx_owner == NULL) + break; + } } spc->spc_spinning--; }