From: Claudio Jeker Subject: Re: getrusage() memory stats To: David Gwynne Cc: Mark Kettenis , tech@openbsd.org Date: Thu, 1 May 2025 10:07:13 +0200 On Thu, May 01, 2025 at 04:43:19PM +1000, David Gwynne wrote: > On Sun, Dec 08, 2024 at 10:34:33PM +1000, David Gwynne wrote: > > On Fri, Dec 06, 2024 at 06:45:44PM +0100, Mark Kettenis wrote: > > > > Date: Fri, 6 Dec 2024 18:04:40 +0100 > > > > From: Claudio Jeker > > > > > > > > On Fri, Dec 06, 2024 at 02:42:14PM +0100, Mark Kettenis wrote: > > > > > > Date: Fri, 6 Dec 2024 16:13:59 +1000 > > > > > > From: David Gwynne > > > > > > > > > > > > this repurposes the memory usage fields in struct rusage to give the > > > > > > current memory usage info when using getrusage(RUSAGE_SELF). this > > > > > > is the same as what we can see in tools like top. > > > > > > > > > > > > currently these fields are 0. they're documented as being max and some > > > > > > integral values i dont understand. having the current values is better > > > > > > than nothing. > > > > > > > > > > > > im already using getrusage to collect metrics in some home grown > > > > > > software, having current memory usage nicely complements the cpu > > > > > > usage stats. some of these things run out of memory, so understanding > > > > > > the trend over time will be useful. > > > > > > > > > > > > fwiw, netbsd also uses the current values to populate these fields, but > > > > > > didnt update their doco. > > > > > > > > > > > > ok? > > > > > > > > > > I don't think this is a good idea. The getrusage(2) interface has > > > > > never been about capturing "current" values, but always about > > > > > "accumulated" values. > > > > > > > > Not sure about that: > > > > > > > > RUSAGE_SELF Resources used by the current process. > > > > > > > > RUSAGE_THREAD Resources used by the current thread. > > > > > > > > This is all about the current process / thread. So I think returning > > > > values similar to the ones reported via sysctl / kvm_getprocs is something > > > > we should aim for. At least in these two cases. > > > > > > > > RUSAGE_CHILDREN on the other hand is all about "accumulated" values. > > > > > > Even RUSAGE_SELF and RUSAGE_THREAD are "accumlated". The ru_utime and > > > ru_stime are the total amount of system time that has been used so > > > far, not some indication about how much time we're currently spending. > > > Same for things like ru_majflt or ru_nsignals. > > > > > > > > I guess maxrss is a bit of an odd one out here but the maximum is > > > > > still a value that is determined over the entire run. I guess at > > > > > least here the "current" value can be seen as an approximation of the > > > > > true maximum. But the maximum would be much more interesting as that > > > > > really is the relevant thing to measure to determine how much memory > > > > > you really need for your workload. Is there a reason why you can't > > > > > use sysctl to measure the "current" values? > > > > > > > > I also expressed my worry about changing the meaning of ru_maxrss. > > > > I think that should indeed always return the maximum rss seen and not > > > > a current value. > > > > > > > > > The 4.3BSD definition of struct rusage has spread widely to other > > > > > OSes. Changing its interpretation just on OpenBSD isn't going to help > > > > > people trying to write portable code. > > > > > > > > Right now ru_ixrss, ru_idrss and ru_isrss are always 0 on OpenBSD. So I > > > > think there is room for improvement even for portable code. > > > > > > Sure. Implement them to as they're documented. Looks like FreeBSD > > > does something reasonable here: > > > > > > void > > > statclock(int cnt, int usermode) > > > { > > > ... > > > > > > /* Update resource usage integrals and maximums. */ > > > MPASS(p->p_vmspace != NULL); > > > vm = p->p_vmspace; > > > ru = &td->td_ru; > > > ru->ru_ixrss += pgtok(vm->vm_tsize) * cnt; > > > ru->ru_idrss += pgtok(vm->vm_dsize) * cnt; > > > ru->ru_isrss += pgtok(vm->vm_ssize) * cnt; > > > rss = pgtok(vmspace_resident_count(vm)); > > > if (ru->ru_maxrss < rss) > > > ru->ru_maxrss = rss; > > > > > > ... > > > } > > > > i can try and find the right place for this. > > > > > Note that we have code in usr.bin/time/time.c that prints these and > > > divides by a number of ticks to calculate average values. So changing > > > these to current values is just wrong. > > > > if only there was some way to change code that used these fields... > > > > > If we would want to report current values, we should add new members > > > to the struct. But I don't think getrusage(2) should be (ab)used to > > > report current values. > > > > technically the current memory used values are aggregations. they're > > the sum of all previous allocations less the sum of all memory that > > was returned to the kernel. it just happens to make sense as an > > instantaneous value, unlike most of the other things in rusage which > > need some kind of derivation. > > > > i think the diff below implements the sampling and accumulation of rss > > values in statclock like freebsd. uvm currently measures maxrss, but it > > would be simple to pull it into statclock too if we want. > > > > is the kern_exit.c chunk correct? > > > > i havent tried collecting these these "integral" values periodically > > within a program or try to make meaningful graphs out of them, so i > > can't argue about whether more info in getrusage is necessary or if i > > can make something comprehensible out of these "proper" measurements. > > is now a good time for me to put this in? Yes, I have been running with this for a while and the reported numbers seem better to me than what we had before (0). Would it make sense to use a tu_rss array like tu_ticks? Maybe then we could simplify the accumulation code (which the compiler will unroll again). OK claudio@ > > Index: sys/proc.h > > =================================================================== > > RCS file: /cvs/src/sys/sys/proc.h,v > > diff -u -p -r1.376 proc.h > > --- sys/proc.h 22 Oct 2024 11:54:05 -0000 1.376 > > +++ sys/proc.h 8 Dec 2024 12:09:11 -0000 > > @@ -95,11 +95,20 @@ struct pgrp { > > * generation counter. Code should use tu_enter() and tu_leave() for this. > > * The process ps_tu structure is locked by the ps_mtx. > > */ > > +#define TU_UTICKS 0 /* Statclock hits in user mode. */ > > +#define TU_STICKS 1 /* Statclock hits in system mode. */ > > +#define TU_ITICKS 2 /* Statclock hits processing intr. */ > > +#define TU_TICKS_COUNT 3 > > + > > struct tusage { > > uint64_t tu_gen; /* generation counter */ > > - uint64_t tu_uticks; /* Statclock hits in user mode. */ > > - uint64_t tu_sticks; /* Statclock hits in system mode. */ > > - uint64_t tu_iticks; /* Statclock hits processing intr. */ > > + uint64_t tu_ticks[TU_TICKS_COUNT]; > > +#define tu_uticks tu_ticks[TU_UTICKS] > > +#define tu_sticks tu_ticks[TU_STICKS] > > +#define tu_iticks tu_ticks[TU_ITICKS] > > + uint64_t tu_ixrss; > > + uint64_t tu_idrss; > > + uint64_t tu_isrss; > > struct timespec tu_runtime; /* Realtime. */ > > }; > > > > Index: kern/kern_clock.c > > =================================================================== > > RCS file: /cvs/src/sys/kern/kern_clock.c,v > > diff -u -p -r1.124 kern_clock.c > > --- kern/kern_clock.c 8 Jul 2024 13:17:11 -0000 1.124 > > +++ kern/kern_clock.c 8 Dec 2024 12:09:11 -0000 > > @@ -49,6 +49,7 @@ > > #include > > #include > > #include > > +#include > > > > /* > > * Clock handling routines. > > @@ -267,6 +268,8 @@ statclock(struct clockrequest *cr, void > > struct schedstate_percpu *spc = &ci->ci_schedstate; > > struct proc *p = curproc; > > struct process *pr; > > + int tu_tick = -1; > > + int cp_time; > > > > if (statclock_is_randomized) { > > count = clockrequest_advance_random(cr, statclock_min, > > @@ -281,13 +284,8 @@ statclock(struct clockrequest *cr, void > > * Came from user mode; CPU was in user state. > > * If this process is being profiled record the tick. > > */ > > - tu_enter(&p->p_tu); > > - p->p_tu.tu_uticks += count; > > - tu_leave(&p->p_tu); > > - if (pr->ps_nice > NZERO) > > - spc->spc_cp_time[CP_NICE] += count; > > - else > > - spc->spc_cp_time[CP_USER] += count; > > + tu_tick = TU_UTICKS; > > + cp_time = (pr->ps_nice > NZERO) ? CP_NICE : CP_USER; > > } else { > > /* > > * Came from kernel mode, so we were: > > @@ -303,26 +301,39 @@ statclock(struct clockrequest *cr, void > > * in ``non-process'' (i.e., interrupt) work. > > */ > > if (CLKF_INTR(frame)) { > > - if (p != NULL) { > > - tu_enter(&p->p_tu); > > - p->p_tu.tu_iticks += count; > > - tu_leave(&p->p_tu); > > - } > > - spc->spc_cp_time[spc->spc_spinning ? > > - CP_SPIN : CP_INTR] += count; > > + tu_tick = TU_ITICKS; > > + cp_time = CP_INTR; > > } else if (p != NULL && p != spc->spc_idleproc) { > > - tu_enter(&p->p_tu); > > - p->p_tu.tu_sticks += count; > > - tu_leave(&p->p_tu); > > - spc->spc_cp_time[spc->spc_spinning ? > > - CP_SPIN : CP_SYS] += count; > > + tu_tick = TU_STICKS; > > + cp_time = CP_SYS; > > } else > > - spc->spc_cp_time[spc->spc_spinning ? > > - CP_SPIN : CP_IDLE] += count; > > + cp_time = CP_IDLE; > > + > > + if (spc->spc_spinning) > > + cp_time = CP_SPIN; > > } > > > > + spc->spc_cp_time[cp_time] += count; > > + > > if (p != NULL) { > > p->p_cpticks += count; > > + > > + if (!ISSET(p->p_flag, P_SYSTEM) && tu_tick != -1) { > > + struct vmspace *vm = p->p_vmspace; > > + struct tusage *tu = &p->p_tu; > > + > > + tu_enter(tu); > > + tu->tu_ticks[tu_tick] += count; > > + > > + /* maxrss is handled by uvm */ > > + if (tu_tick != TU_ITICKS) { > > + tu->tu_ixrss += (vm->vm_tsize << (PAGE_SHIFT - 10)) * count; > > + tu->tu_idrss += (vm->vm_dused << (PAGE_SHIFT - 10)) * count; > > + tu->tu_isrss += (vm->vm_ssize << (PAGE_SHIFT - 10)) * count; > > + } > > + tu_leave(tu); > > + } > > + > > /* > > * schedclock() runs every fourth statclock(). > > */ > > Index: kern/kern_exit.c > > =================================================================== > > RCS file: /cvs/src/sys/kern/kern_exit.c,v > > diff -u -p -r1.239 kern_exit.c > > --- kern/kern_exit.c 15 Oct 2024 13:49:26 -0000 1.239 > > +++ kern/kern_exit.c 8 Dec 2024 12:09:11 -0000 > > @@ -346,6 +346,9 @@ exit1(struct proc *p, int xexit, int xsi > > * and calculate the total times. > > */ > > calcru(&pr->ps_tu, &rup->ru_utime, &rup->ru_stime, NULL); > > + rup->ru_ixrss = pr->ps_tu.tu_ixrss; > > + rup->ru_idrss = pr->ps_tu.tu_idrss; > > + rup->ru_isrss = pr->ps_tu.tu_isrss; > > ruadd(rup, &pr->ps_cru); > > > > /* > > Index: kern/kern_resource.c > > =================================================================== > > RCS file: /cvs/src/sys/kern/kern_resource.c,v > > diff -u -p -r1.93 kern_resource.c > > --- kern/kern_resource.c 10 Nov 2024 06:45:36 -0000 1.93 > > +++ kern/kern_resource.c 8 Dec 2024 12:09:11 -0000 > > @@ -395,6 +395,9 @@ tuagg_sumup(struct tusage *tu, const str > > tu->tu_uticks += tmp.tu_uticks; > > tu->tu_sticks += tmp.tu_sticks; > > tu->tu_iticks += tmp.tu_iticks; > > + tu->tu_ixrss += tmp.tu_ixrss; > > + tu->tu_idrss += tmp.tu_idrss; > > + tu->tu_isrss += tmp.tu_isrss; > > timespecadd(&tu->tu_runtime, &tmp.tu_runtime, &tu->tu_runtime); > > } > > > > @@ -440,6 +443,7 @@ tuagg_add_process(struct process *pr, st > > /* Now reset CPU time usage for the thread. */ > > timespecclear(&p->p_tu.tu_runtime); > > p->p_tu.tu_uticks = p->p_tu.tu_sticks = p->p_tu.tu_iticks = 0; > > + p->p_tu.tu_ixrss = p->p_tu.tu_idrss = p->p_tu.tu_isrss = 0; > > } > > > > void > > @@ -567,6 +571,10 @@ dogetrusage(struct proc *p, int who, str > > } > > > > calcru(&tu, &rup->ru_utime, &rup->ru_stime, NULL); > > + > > + rup->ru_ixrss = tu.tu_ixrss; > > + rup->ru_idrss = tu.tu_idrss; > > + rup->ru_isrss = tu.tu_isrss; > > break; > > > > case RUSAGE_THREAD: -- :wq Claudio