Download raw body.
dt(4): profile: don't stagger clock interrupts
On Fri, Feb 09, 2024 at 02:22:48PM -0600, Scott Cheloha wrote:
> Now that the profile probe is separated from the hardclock() we can
> start improving it.
>
> The simplest thing we can do to reduce profiling overhead is to get
> rid of clock interrupt staggering. It's an artifact of the
> hardclock(). The problem is intuitive: on average, reading N
> profiling events during a single wakeup is cheaper than reading a
> single profiling event across N separate wakeups.
>
> Two "gotchas" to take note of:
>
> 1. The event buffer in btrace(8) is fixed-size. On machines with
> lots of CPUs there may not be enough room to grab all the
> profiling events in one read(2).
>
> 2. There is a hotspot in dt_pcb_ring_consume() where every
> CPU on the system will try to enter ds_mtx simultaneously
> to increment ds_evtcnt.
>
> Both can be fixed separately. Plus, the overhead of the mutex
> contention in (2) is miniscule compared to the overhead of the extra
> wakeups under the current scheme.
>
> This can wait a few days, just in case we need to back out the recent
> dt(4) changes.
Ping.
Index: dt_dev.c
===================================================================
RCS file: /cvs/src/sys/dev/dt/dt_dev.c,v
diff -u -p -r1.30 dt_dev.c
--- dt_dev.c 9 Feb 2024 17:42:18 -0000 1.30
+++ dt_dev.c 9 Feb 2024 20:06:00 -0000
@@ -497,8 +497,6 @@ dt_ioctl_record_start(struct dt_softc *s
if (dp->dp_nsecs != 0) {
clockintr_bind(&dp->dp_clockintr, dp->dp_cpu, dt_clock,
dp);
- clockintr_stagger(&dp->dp_clockintr, dp->dp_nsecs,
- CPU_INFO_UNIT(dp->dp_cpu), MAXCPUS);
clockintr_advance(&dp->dp_clockintr, dp->dp_nsecs);
}
}
dt(4): profile: don't stagger clock interrupts