Index | Thread | Search

From:
Claudio Jeker <cjeker@diehard.n-r-g.com>
Subject:
Re: getrusage() memory stats
To:
David Gwynne <david@gwynne.id.au>
Cc:
Mark Kettenis <mark.kettenis@xs4all.nl>, tech@openbsd.org
Date:
Thu, 1 May 2025 10:07:13 +0200

Download raw body.

Thread
  • Lyndon Nerenberg (VE7TFX/VE6BBM):

    getrusage() memory stats

  • On Thu, May 01, 2025 at 04:43:19PM +1000, David Gwynne wrote:
    > On Sun, Dec 08, 2024 at 10:34:33PM +1000, David Gwynne wrote:
    > > On Fri, Dec 06, 2024 at 06:45:44PM +0100, Mark Kettenis wrote:
    > > > > Date: Fri, 6 Dec 2024 18:04:40 +0100
    > > > > From: Claudio Jeker <cjeker@diehard.n-r-g.com>
    > > > > 
    > > > > On Fri, Dec 06, 2024 at 02:42:14PM +0100, Mark Kettenis wrote:
    > > > > > > Date: Fri, 6 Dec 2024 16:13:59 +1000
    > > > > > > From: David Gwynne <david@gwynne.id.au>
    > > > > > > 
    > > > > > > this repurposes the memory usage fields in struct rusage to give the
    > > > > > > current memory usage info when using getrusage(RUSAGE_SELF). this
    > > > > > > is the same as what we can see in tools like top.
    > > > > > > 
    > > > > > > currently these fields are 0. they're documented as being max and some
    > > > > > > integral values i dont understand. having the current values is better
    > > > > > > than nothing.
    > > > > > > 
    > > > > > > im already using getrusage to collect metrics in some home grown
    > > > > > > software, having current memory usage nicely complements the cpu
    > > > > > > usage stats. some of these things run out of memory, so understanding
    > > > > > > the trend over time will be useful.
    > > > > > > 
    > > > > > > fwiw, netbsd also uses the current values to populate these fields, but
    > > > > > > didnt update their doco.
    > > > > > > 
    > > > > > > ok?
    > > > > > 
    > > > > > I don't think this is a good idea.  The getrusage(2) interface has
    > > > > > never been about capturing "current" values, but always about
    > > > > > "accumulated" values.
    > > > > 
    > > > > Not sure about that:
    > > > > 
    > > > > RUSAGE_SELF      Resources used by the current process.
    > > > > 
    > > > > RUSAGE_THREAD    Resources used by the current thread.
    > > > > 
    > > > > This is all about the current process / thread. So I think returning
    > > > > values similar to the ones reported via sysctl / kvm_getprocs is something
    > > > > we should aim for. At least in these two cases.
    > > > > 
    > > > > RUSAGE_CHILDREN on the other hand is all about "accumulated" values.
    > > > 
    > > > Even RUSAGE_SELF and RUSAGE_THREAD are "accumlated".  The ru_utime and
    > > > ru_stime are the total amount of system time that has been used so
    > > > far, not some indication about how much time we're currently spending.
    > > > Same for things like ru_majflt or ru_nsignals.
    > > > 
    > > > > > I guess maxrss is a bit of an odd one out here but the maximum is
    > > > > > still a value that is determined over the entire run.  I guess at
    > > > > > least here the "current" value can be seen as an approximation of the
    > > > > > true maximum.  But the maximum would be much more interesting as that
    > > > > > really is the relevant thing to measure to determine how much memory
    > > > > > you really need for your workload.  Is there a reason why you can't
    > > > > > use sysctl to measure the "current" values?
    > > > > 
    > > > > I also expressed my worry about changing the meaning of ru_maxrss.
    > > > > I think that should indeed always return the maximum rss seen and not
    > > > > a current value.
    > > > >  
    > > > > > The 4.3BSD definition of struct rusage has spread widely to other
    > > > > > OSes.  Changing its interpretation just on OpenBSD isn't going to help
    > > > > > people trying to write portable code.
    > > > > 
    > > > > Right now ru_ixrss, ru_idrss and ru_isrss are always 0 on OpenBSD. So I
    > > > > think there is room for improvement even for portable code.
    > > > 
    > > > Sure.  Implement them to as they're documented.  Looks like FreeBSD
    > > > does something reasonable here:
    > > > 
    > > > void
    > > > statclock(int cnt, int usermode)
    > > > {
    > > >         ...
    > > > 
    > > >         /* Update resource usage integrals and maximums. */
    > > >         MPASS(p->p_vmspace != NULL);
    > > >         vm = p->p_vmspace;
    > > >         ru = &td->td_ru;
    > > >         ru->ru_ixrss += pgtok(vm->vm_tsize) * cnt;
    > > >         ru->ru_idrss += pgtok(vm->vm_dsize) * cnt;
    > > >         ru->ru_isrss += pgtok(vm->vm_ssize) * cnt;
    > > >         rss = pgtok(vmspace_resident_count(vm));
    > > >         if (ru->ru_maxrss < rss)
    > > >                 ru->ru_maxrss = rss;
    > > > 
    > > >         ...
    > > > }
    > > 
    > > i can try and find the right place for this.
    > > 
    > > > Note that we have code in usr.bin/time/time.c that prints these and
    > > > divides by a number of ticks to calculate average values.  So changing
    > > > these to current values is just wrong.
    > > 
    > > if only there was some way to change code that used these fields...
    > > 
    > > > If we would want to report current values, we should add new members
    > > > to the struct.  But I don't think getrusage(2) should be (ab)used to
    > > > report current values.
    > > 
    > > technically the current memory used values are aggregations. they're
    > > the sum of all previous allocations less the sum of all memory that
    > > was returned to the kernel. it just happens to make sense as an
    > > instantaneous value, unlike most of the other things in rusage which
    > > need some kind of derivation.
    > > 
    > > i think the diff below implements the sampling and accumulation of rss
    > > values in statclock like freebsd. uvm currently measures maxrss, but it
    > > would be simple to pull it into statclock too if we want.
    > > 
    > > is the kern_exit.c chunk correct?
    > > 
    > > i havent tried collecting these these "integral" values periodically
    > > within a program or try to make meaningful graphs out of them, so i
    > > can't argue about whether more info in getrusage is necessary or if i
    > > can make something comprehensible out of these "proper" measurements.
    > 
    > is now a good time for me to put this in?
    
    Yes, I have been running with this for a while and the reported numbers
    seem better to me than what we had before (0).
     
    Would it make sense to use a tu_rss array like tu_ticks?
    Maybe then we could simplify the accumulation code (which the compiler
    will unroll again).
    
    OK claudio@
    
    > > Index: sys/proc.h
    > > ===================================================================
    > > RCS file: /cvs/src/sys/sys/proc.h,v
    > > diff -u -p -r1.376 proc.h
    > > --- sys/proc.h	22 Oct 2024 11:54:05 -0000	1.376
    > > +++ sys/proc.h	8 Dec 2024 12:09:11 -0000
    > > @@ -95,11 +95,20 @@ struct	pgrp {
    > >   * generation counter. Code should use tu_enter() and tu_leave() for this.
    > >   * The process ps_tu structure is locked by the ps_mtx.
    > >   */
    > > +#define TU_UTICKS	0		/* Statclock hits in user mode. */
    > > +#define TU_STICKS	1		/* Statclock hits in system mode. */
    > > +#define TU_ITICKS	2		/* Statclock hits processing intr. */
    > > +#define TU_TICKS_COUNT	3
    > > +
    > >  struct tusage {
    > >  	uint64_t	tu_gen;		/* generation counter */
    > > -	uint64_t	tu_uticks;	/* Statclock hits in user mode. */
    > > -	uint64_t	tu_sticks;	/* Statclock hits in system mode. */
    > > -	uint64_t	tu_iticks;	/* Statclock hits processing intr. */
    > > +	uint64_t	tu_ticks[TU_TICKS_COUNT];
    > > +#define tu_uticks	tu_ticks[TU_UTICKS]
    > > +#define tu_sticks	tu_ticks[TU_STICKS]
    > > +#define tu_iticks	tu_ticks[TU_ITICKS]
    > > +	uint64_t	tu_ixrss;
    > > +	uint64_t	tu_idrss;
    > > +	uint64_t	tu_isrss;
    > >  	struct	timespec tu_runtime;	/* Realtime. */
    > >  };
    > >  
    > > Index: kern/kern_clock.c
    > > ===================================================================
    > > RCS file: /cvs/src/sys/kern/kern_clock.c,v
    > > diff -u -p -r1.124 kern_clock.c
    > > --- kern/kern_clock.c	8 Jul 2024 13:17:11 -0000	1.124
    > > +++ kern/kern_clock.c	8 Dec 2024 12:09:11 -0000
    > > @@ -49,6 +49,7 @@
    > >  #include <sys/sysctl.h>
    > >  #include <sys/sched.h>
    > >  #include <sys/timetc.h>
    > > +#include <uvm/uvm_extern.h>
    > >  
    > >  /*
    > >   * Clock handling routines.
    > > @@ -267,6 +268,8 @@ statclock(struct clockrequest *cr, void 
    > >  	struct schedstate_percpu *spc = &ci->ci_schedstate;
    > >  	struct proc *p = curproc;
    > >  	struct process *pr;
    > > +	int tu_tick = -1;
    > > +	int cp_time;
    > >  
    > >  	if (statclock_is_randomized) {
    > >  		count = clockrequest_advance_random(cr, statclock_min,
    > > @@ -281,13 +284,8 @@ statclock(struct clockrequest *cr, void 
    > >  		 * Came from user mode; CPU was in user state.
    > >  		 * If this process is being profiled record the tick.
    > >  		 */
    > > -		tu_enter(&p->p_tu);
    > > -		p->p_tu.tu_uticks += count;
    > > -		tu_leave(&p->p_tu);
    > > -		if (pr->ps_nice > NZERO)
    > > -			spc->spc_cp_time[CP_NICE] += count;
    > > -		else
    > > -			spc->spc_cp_time[CP_USER] += count;
    > > +		tu_tick = TU_UTICKS;
    > > +		cp_time = (pr->ps_nice > NZERO) ? CP_NICE : CP_USER;
    > >  	} else {
    > >  		/*
    > >  		 * Came from kernel mode, so we were:
    > > @@ -303,26 +301,39 @@ statclock(struct clockrequest *cr, void 
    > >  		 * in ``non-process'' (i.e., interrupt) work.
    > >  		 */
    > >  		if (CLKF_INTR(frame)) {
    > > -			if (p != NULL) {
    > > -				tu_enter(&p->p_tu);
    > > -				p->p_tu.tu_iticks += count;
    > > -				tu_leave(&p->p_tu);
    > > -			}
    > > -			spc->spc_cp_time[spc->spc_spinning ?
    > > -			    CP_SPIN : CP_INTR] += count;
    > > +			tu_tick = TU_ITICKS;
    > > +			cp_time = CP_INTR;
    > >  		} else if (p != NULL && p != spc->spc_idleproc) {
    > > -			tu_enter(&p->p_tu);
    > > -			p->p_tu.tu_sticks += count;
    > > -			tu_leave(&p->p_tu);
    > > -			spc->spc_cp_time[spc->spc_spinning ?
    > > -			    CP_SPIN : CP_SYS] += count;
    > > +			tu_tick = TU_STICKS;
    > > +			cp_time = CP_SYS;
    > >  		} else
    > > -			spc->spc_cp_time[spc->spc_spinning ?
    > > -			    CP_SPIN : CP_IDLE] += count;
    > > +			cp_time = CP_IDLE;
    > > +
    > > +		if (spc->spc_spinning)
    > > +			cp_time = CP_SPIN;
    > >  	}
    > >  
    > > +	spc->spc_cp_time[cp_time] += count;
    > > +
    > >  	if (p != NULL) {
    > >  		p->p_cpticks += count;
    > > +
    > > +		if (!ISSET(p->p_flag, P_SYSTEM) && tu_tick != -1) {
    > > +			struct vmspace *vm = p->p_vmspace;
    > > +			struct tusage *tu = &p->p_tu;
    > > +
    > > +			tu_enter(tu);
    > > +			tu->tu_ticks[tu_tick] += count;
    > > +
    > > +			/* maxrss is handled by uvm */
    > > +			if (tu_tick != TU_ITICKS) {
    > > +				tu->tu_ixrss += (vm->vm_tsize << (PAGE_SHIFT - 10)) * count;
    > > +				tu->tu_idrss += (vm->vm_dused << (PAGE_SHIFT - 10)) * count;
    > > +				tu->tu_isrss += (vm->vm_ssize << (PAGE_SHIFT - 10)) * count;
    > > +			}
    > > +			tu_leave(tu);
    > > +		}
    > > +
    > >  		/*
    > >  		 * schedclock() runs every fourth statclock().
    > >  		 */
    > > Index: kern/kern_exit.c
    > > ===================================================================
    > > RCS file: /cvs/src/sys/kern/kern_exit.c,v
    > > diff -u -p -r1.239 kern_exit.c
    > > --- kern/kern_exit.c	15 Oct 2024 13:49:26 -0000	1.239
    > > +++ kern/kern_exit.c	8 Dec 2024 12:09:11 -0000
    > > @@ -346,6 +346,9 @@ exit1(struct proc *p, int xexit, int xsi
    > >  		 * and calculate the total times.
    > >  		 */
    > >  		calcru(&pr->ps_tu, &rup->ru_utime, &rup->ru_stime, NULL);
    > > +		rup->ru_ixrss = pr->ps_tu.tu_ixrss;
    > > +		rup->ru_idrss = pr->ps_tu.tu_idrss;
    > > +		rup->ru_isrss = pr->ps_tu.tu_isrss;
    > >  		ruadd(rup, &pr->ps_cru);
    > >  
    > >  		/*
    > > Index: kern/kern_resource.c
    > > ===================================================================
    > > RCS file: /cvs/src/sys/kern/kern_resource.c,v
    > > diff -u -p -r1.93 kern_resource.c
    > > --- kern/kern_resource.c	10 Nov 2024 06:45:36 -0000	1.93
    > > +++ kern/kern_resource.c	8 Dec 2024 12:09:11 -0000
    > > @@ -395,6 +395,9 @@ tuagg_sumup(struct tusage *tu, const str
    > >  	tu->tu_uticks += tmp.tu_uticks;
    > >  	tu->tu_sticks += tmp.tu_sticks;
    > >  	tu->tu_iticks += tmp.tu_iticks;
    > > +	tu->tu_ixrss += tmp.tu_ixrss;
    > > +	tu->tu_idrss += tmp.tu_idrss;
    > > +	tu->tu_isrss += tmp.tu_isrss;
    > >  	timespecadd(&tu->tu_runtime, &tmp.tu_runtime, &tu->tu_runtime);
    > >  }
    > >  
    > > @@ -440,6 +443,7 @@ tuagg_add_process(struct process *pr, st
    > >  	/* Now reset CPU time usage for the thread. */
    > >  	timespecclear(&p->p_tu.tu_runtime);
    > >  	p->p_tu.tu_uticks = p->p_tu.tu_sticks = p->p_tu.tu_iticks = 0;
    > > +	p->p_tu.tu_ixrss = p->p_tu.tu_idrss = p->p_tu.tu_isrss = 0;
    > >  }
    > >  
    > >  void
    > > @@ -567,6 +571,10 @@ dogetrusage(struct proc *p, int who, str
    > >  		}
    > >  
    > >  		calcru(&tu, &rup->ru_utime, &rup->ru_stime, NULL);
    > > +
    > > +		rup->ru_ixrss = tu.tu_ixrss;
    > > +		rup->ru_idrss = tu.tu_idrss;
    > > +		rup->ru_isrss = tu.tu_isrss;
    > >  		break;
    > >  
    > >  	case RUSAGE_THREAD:
    
    -- 
    :wq Claudio
    
    
  • Lyndon Nerenberg (VE7TFX/VE6BBM):

    getrusage() memory stats