From: Claudio Jeker <claudio@openbsd.org>
Subject: Re: dt(4): interval/profile: schedule clockintr relative to start of recording
To: Scott Cheloha <scottcheloha@gmail.com>
Cc: tech@openbsd.org, mpi@openbsd.org
Date: Wed, 28 Feb 2024 08:59:56 +0100

On Tue, Feb 27, 2024 at 01:59:14PM -0600, Scott Cheloha wrote:
> On Mon, Feb 26, 2024 at 12:07:50PM +0100, Claudio Jeker wrote:
> > On Sat, Feb 24, 2024 at 04:01:28PM -0600, Scott Cheloha wrote:
> > > On Sat, Feb 24, 2024 at 10:30:48PM +0100, Claudio Jeker wrote:
> > > > On Sat, Feb 24, 2024 at 10:03:27AM -0600, Scott Cheloha wrote:
> > > > > To align btrace(8)'s behavior with bpftrace we need to schedule the
> > > > > interval/profile clock interrupts relative to the start of recording,
> > > > > not relative to the start of the uptime clock.
> > > > > 
> > > > > The problem is obvious when you compare behavior.
> > > > > 
> > > > > bpftrace:
> > > > > 
> > > > > alpine:~# uname -a
> > > > > Linux alpine 6.6.16-0-lts #1-Alpine SMP PREEMPT_DYNAMIC Wed, 07 Feb 2024 18:00:38 +0000 x86_64 Linux
> > > > > alpine:~# bpftrace --version
> > > > > bpftrace v0.19.1
> > > > > alpine:~# cat interval-start-latency.bt
> > > > > BEGIN {
> > > > > 	@t0 = nsecs;
> > > > > }
> > > > > 
> > > > > interval:hz:1 {
> > > > > 	$dt = nsecs - @t0;
> > > > > 	printf("elapsed %d.%09d\n", $dt / 1000000000, $dt % 1000000000);
> > > > > 	exit();
> > > > > }
> > > > > alpine:~# bpftrace -q interval-start-latency.bt | head -n 1
> > > > > elapsed 1.014731440
> > > > > alpine:~# cat profile-start-latency.bt
> > > > > BEGIN {
> > > > > 	@t0 = nsecs;
> > > > > }
> > > > > 
> > > > > profile:hz:1 {
> > > > > 	$dt = nsecs - @t0;
> > > > > 	printf("elapsed %d.%09d\n", $dt / 1000000000, $dt % 1000000000);
> > > > > 	exit();
> > > > > }
> > > > > alpine:~# bpftrace -q profile-start-latency.bt | head -n 1
> > > > > elapsed 1.023011922
> > > > > 
> > > > > btrace(8):
> > > > > 
> > > > > $ doas btrace interval-start-latency.bt
> > > > > elapsed 0.586779311
> > > > > $ doas btrace profile-start-latency.bt
> > > > > elapsed 0.252215375
> > > > > 
> > > > > Fix attached.  ok?
> > > > 
> > > > btrace is not bpftrace. I think in most cases we should not try to emulate
> > > > bpftrace since that will go poorly.
> > > 
> > > I thought one of the goals of btrace(8) *was* to emulate bpftrace.  At
> > > least, I thought that was a goal when doing so wasn't difficult or
> > > overly complicated.
> > 
> > While btrace(8) is emulating bpftrace it is a different tool and we don't
> > need to be feature and bug compatible. That's all I want to point out.
> 
> Okay, I think we are on the same page.
> 
> > > Even if that isn't the case (?), in this particular instance I really
> > > think the bpftrace behavior is better than ours.
> > > 
> > > When I run a periodic probe, I want the execution period to be
> > > consistent because that makes it simpler to compare two intervals.
> > > 
> > > Example: suppose you're printing a summary of whatever you're
> > > measuring once per second.  You probably want the measurement interval
> > > to be the same for each summary, i.e. you want the first summary to
> > > print 1 second after you start measuring, and then the next summary to
> > > print 1 second after that, and so on and so forth.  This patch makes
> > > btrace(8) do exactly that.
> > 
> > Yes, this makes sense. We want to start recording all probes at the same
> > time and then they should fire at their rate. I agree that this behaviour
> > is less surprising.
> >  
> > > Scheduling the clock interrupt relative to the moment we start
> > > recording also yields more predictable behavior across different runs
> > > of a script.  Right now, the latency between BEGIN and the first time
> > > a periodic probe fires is essentially random each time you run.
> > 
> > Could we simplify this diff please? I see no reason why now is set
> > conditionally. See below.
> >  
> > > Index: dt_dev.c
> > > ===================================================================
> > > RCS file: /cvs/src/sys/dev/dt/dt_dev.c,v
> > > diff -u -p -r1.31 dt_dev.c
> > > --- dt_dev.c	18 Feb 2024 00:54:03 -0000	1.31
> > > +++ dt_dev.c	24 Feb 2024 15:59:17 -0000
> > > @@ -477,7 +477,9 @@ dt_ioctl_get_stats(struct dt_softc *sc, 
> > >  int
> > >  dt_ioctl_record_start(struct dt_softc *sc)
> > >  {
> > > +	uint64_t now;
> > >  	struct dt_pcb *dp;
> > > +	int need_time = 1;
> > >  
> > >  	if (sc->ds_recording)
> > >  		return EBUSY;
> > > @@ -495,9 +497,14 @@ dt_ioctl_record_start(struct dt_softc *s
> > >  		dtp->dtp_prov->dtpv_recording++;
> > >  
> > >  		if (dp->dp_nsecs != 0) {
> > > +			if (need_time) {
> > > +				need_time = 0;
> > > +				now = nsecuptime();
> > > +			}
> > 
> > Instead of adding need_time can't we just call now = nsecuptime(); between
> >         rw_enter_write(&dt_lock);
> > and
> >         TAILQ_FOREACH(dp, &sc->ds_pcbs, dp_snext) {
> > 
> > dt_ioctl_record_start() is not called that frequently that an extra
> > nsecuptime() would be noticed.
> 
> Yeah, I guess I'm being a little too fussy.  Let's go with that.
> 
> ok?
> 
> Index: dt_dev.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/dt/dt_dev.c,v
> diff -u -p -r1.31 dt_dev.c
> --- dt_dev.c	18 Feb 2024 00:54:03 -0000	1.31
> +++ dt_dev.c	27 Feb 2024 19:58:30 -0000
> @@ -477,6 +477,7 @@ dt_ioctl_get_stats(struct dt_softc *sc, 
>  int
>  dt_ioctl_record_start(struct dt_softc *sc)
>  {
> +	uint64_t now;
>  	struct dt_pcb *dp;
>  
>  	if (sc->ds_recording)
> @@ -487,6 +488,7 @@ dt_ioctl_record_start(struct dt_softc *s
>  		return ENOENT;
>  
>  	rw_enter_write(&dt_lock);
> +	now = nsecuptime();
>  	TAILQ_FOREACH(dp, &sc->ds_pcbs, dp_snext) {
>  		struct dt_probe *dtp = dp->dp_dtp;
>  
> @@ -497,7 +499,8 @@ dt_ioctl_record_start(struct dt_softc *s
>  		if (dp->dp_nsecs != 0) {
>  			clockintr_bind(&dp->dp_clockintr, dp->dp_cpu, dt_clock,
>  			    dp);
> -			clockintr_advance(&dp->dp_clockintr, dp->dp_nsecs);
> +			clockintr_schedule(&dp->dp_clockintr,
> +			    now + dp->dp_nsecs);
>  		}
>  	}
>  	rw_exit_write(&dt_lock);

OK claudio@
-- 
:wq Claudio