Download raw body.
dt(4): profile: don't stagger clock interrupts
On 09/02/24(Fri) 14:22, Scott Cheloha wrote:
> Now that the profile probe is separated from the hardclock() we can
> start improving it.
>
> The simplest thing we can do to reduce profiling overhead is to get
> rid of clock interrupt staggering. It's an artifact of the
> hardclock(). The problem is intuitive: on average, reading N
> profiling events during a single wakeup is cheaper than reading a
> single profiling event across N separate wakeups.
I don't understand why staggering was needed in the first place. It
can't be an artifact of hardclock() since you added it, so sure go
ahead and remove it.
I'm reading the manual you sent me, did you forgot to commit it?
> Two "gotchas" to take note of:
>
> 1. The event buffer in btrace(8) is fixed-size. On machines with
> lots of CPUs there may not be enough room to grab all the
> profiling events in one read(2).
>
> 2. There is a hotspot in dt_pcb_ring_consume() where every
> CPU on the system will try to enter ds_mtx simultaneously
> to increment ds_evtcnt.
>
> Both can be fixed separately. Plus, the overhead of the mutex
> contention in (2) is miniscule compared to the overhead of the extra
> wakeups under the current scheme.
>
> This can wait a few days, just in case we need to back out the recent
> dt(4) changes.
>
> Index: dt_dev.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/dt/dt_dev.c,v
> diff -u -p -r1.30 dt_dev.c
> --- dt_dev.c 9 Feb 2024 17:42:18 -0000 1.30
> +++ dt_dev.c 9 Feb 2024 20:06:00 -0000
> @@ -497,8 +497,6 @@ dt_ioctl_record_start(struct dt_softc *s
> if (dp->dp_nsecs != 0) {
> clockintr_bind(&dp->dp_clockintr, dp->dp_cpu, dt_clock,
> dp);
> - clockintr_stagger(&dp->dp_clockintr, dp->dp_nsecs,
> - CPU_INFO_UNIT(dp->dp_cpu), MAXCPUS);
> clockintr_advance(&dp->dp_clockintr, dp->dp_nsecs);
> }
> }
dt(4): profile: don't stagger clock interrupts