From: Martin Pieuchot Subject: Re: improve spinning in mtx_enter To: Alexander Bluhm Cc: Mateusz Guzik , tech@openbsd.org Date: Sun, 24 Mar 2024 11:09:12 +0100 On 21/03/24(Thu) 13:22, Alexander Bluhm wrote: > On Wed, Mar 20, 2024 at 01:43:18PM +0100, Mateusz Guzik wrote: > > few years back I noted that migration from MD mutex code to MI > > implementation happened to regress performance at least on amd64. > > > > The MI implementation fails to check if the lock is free before issuing > > CAS, and only spins once before attempts. This avoidably reduces > > performance. > > > > While more can be done here, bare minimun the code needs to NULL check, > > which I implemented below. > > > > Results on 7.5 (taken from snapshots) timing make -ss -j 16 in the > > kernel dir: > > > > before: 521.37s user 524.69s system 1080% cpu 1:36.79 total > > after: 522.76s user 486.87s system 1088% cpu 1:32.79 total > > > > That is about 4% reduction in total real time for a rather trivial > > change. > > I refactored the diff a bit to keep the while loop. > > With that I get 5% reduction in sys time during kernel build on a > 8 CPU machine. It has two sockets with 4 cores each. > > Flame graphs of kernel build are here: > http://bluhm.genua.de/perform/results/2024-03-19T23:41:35Z/2024-03-19T00%3A00%3A00Z/btrace/time_-lp_make_-CGENERIC.MP_-j8_-s-btrace-kstack.0.svg > http://bluhm.genua.de/perform/results/2024-03-19T23:41:35Z/patch-sys-mutex-spin.1/btrace/time_-lp_make_-CGENERIC.MP_-j8_-s-btrace-kstack.0.svg > > If you zoom into kernel and search for mtx_enter, you see that > spinning time goes down from 13.1% to 10.7% > > Should we commit this? ok mpi@ > Index: kern/kern_lock.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_lock.c,v > diff -u -p -r1.72 kern_lock.c > --- kern/kern_lock.c 26 Apr 2022 15:31:14 -0000 1.72 > +++ kern/kern_lock.c 21 Mar 2024 12:11:20 -0000 > @@ -264,15 +264,17 @@ mtx_enter(struct mutex *mtx) > > spc->spc_spinning++; > while (mtx_enter_try(mtx) == 0) { > - CPU_BUSY_CYCLE(); > - > + do { > + CPU_BUSY_CYCLE(); > #ifdef MP_LOCKDEBUG > - if (--nticks == 0) { > - db_printf("%s: %p lock spun out\n", __func__, mtx); > - db_enter(); > - nticks = __mp_lock_spinout; > - } > + if (--nticks == 0) { > + db_printf("%s: %p lock spun out\n", > + __func__, mtx); > + db_enter(); > + nticks = __mp_lock_spinout; > + } > #endif > + } while (mtx->mtx_owner != NULL); > } > spc->spc_spinning--; > } >