From: Scott Cheloha Subject: dt(4): profile: don't stagger clock interrupts To: tech@openbsd.org Cc: mpi@openbsd.org Date: Fri, 9 Feb 2024 14:22:48 -0600 Now that the profile probe is separated from the hardclock() we can start improving it. The simplest thing we can do to reduce profiling overhead is to get rid of clock interrupt staggering. It's an artifact of the hardclock(). The problem is intuitive: on average, reading N profiling events during a single wakeup is cheaper than reading a single profiling event across N separate wakeups. Two "gotchas" to take note of: 1. The event buffer in btrace(8) is fixed-size. On machines with lots of CPUs there may not be enough room to grab all the profiling events in one read(2). 2. There is a hotspot in dt_pcb_ring_consume() where every CPU on the system will try to enter ds_mtx simultaneously to increment ds_evtcnt. Both can be fixed separately. Plus, the overhead of the mutex contention in (2) is miniscule compared to the overhead of the extra wakeups under the current scheme. This can wait a few days, just in case we need to back out the recent dt(4) changes. Index: dt_dev.c =================================================================== RCS file: /cvs/src/sys/dev/dt/dt_dev.c,v diff -u -p -r1.30 dt_dev.c --- dt_dev.c 9 Feb 2024 17:42:18 -0000 1.30 +++ dt_dev.c 9 Feb 2024 20:06:00 -0000 @@ -497,8 +497,6 @@ dt_ioctl_record_start(struct dt_softc *s if (dp->dp_nsecs != 0) { clockintr_bind(&dp->dp_clockintr, dp->dp_cpu, dt_clock, dp); - clockintr_stagger(&dp->dp_clockintr, dp->dp_nsecs, - CPU_INFO_UNIT(dp->dp_cpu), MAXCPUS); clockintr_advance(&dp->dp_clockintr, dp->dp_nsecs); } }