Download raw body.
getrusage() memory stats
On Fri, Dec 06, 2024 at 06:45:44PM +0100, Mark Kettenis wrote:
> > Date: Fri, 6 Dec 2024 18:04:40 +0100
> > From: Claudio Jeker <cjeker@diehard.n-r-g.com>
> >
> > On Fri, Dec 06, 2024 at 02:42:14PM +0100, Mark Kettenis wrote:
> > > > Date: Fri, 6 Dec 2024 16:13:59 +1000
> > > > From: David Gwynne <david@gwynne.id.au>
> > > >
> > > > this repurposes the memory usage fields in struct rusage to give the
> > > > current memory usage info when using getrusage(RUSAGE_SELF). this
> > > > is the same as what we can see in tools like top.
> > > >
> > > > currently these fields are 0. they're documented as being max and some
> > > > integral values i dont understand. having the current values is better
> > > > than nothing.
> > > >
> > > > im already using getrusage to collect metrics in some home grown
> > > > software, having current memory usage nicely complements the cpu
> > > > usage stats. some of these things run out of memory, so understanding
> > > > the trend over time will be useful.
> > > >
> > > > fwiw, netbsd also uses the current values to populate these fields, but
> > > > didnt update their doco.
> > > >
> > > > ok?
> > >
> > > I don't think this is a good idea. The getrusage(2) interface has
> > > never been about capturing "current" values, but always about
> > > "accumulated" values.
> >
> > Not sure about that:
> >
> > RUSAGE_SELF Resources used by the current process.
> >
> > RUSAGE_THREAD Resources used by the current thread.
> >
> > This is all about the current process / thread. So I think returning
> > values similar to the ones reported via sysctl / kvm_getprocs is something
> > we should aim for. At least in these two cases.
> >
> > RUSAGE_CHILDREN on the other hand is all about "accumulated" values.
>
> Even RUSAGE_SELF and RUSAGE_THREAD are "accumlated". The ru_utime and
> ru_stime are the total amount of system time that has been used so
> far, not some indication about how much time we're currently spending.
> Same for things like ru_majflt or ru_nsignals.
>
> > > I guess maxrss is a bit of an odd one out here but the maximum is
> > > still a value that is determined over the entire run. I guess at
> > > least here the "current" value can be seen as an approximation of the
> > > true maximum. But the maximum would be much more interesting as that
> > > really is the relevant thing to measure to determine how much memory
> > > you really need for your workload. Is there a reason why you can't
> > > use sysctl to measure the "current" values?
> >
> > I also expressed my worry about changing the meaning of ru_maxrss.
> > I think that should indeed always return the maximum rss seen and not
> > a current value.
> >
> > > The 4.3BSD definition of struct rusage has spread widely to other
> > > OSes. Changing its interpretation just on OpenBSD isn't going to help
> > > people trying to write portable code.
> >
> > Right now ru_ixrss, ru_idrss and ru_isrss are always 0 on OpenBSD. So I
> > think there is room for improvement even for portable code.
>
> Sure. Implement them to as they're documented. Looks like FreeBSD
> does something reasonable here:
>
> void
> statclock(int cnt, int usermode)
> {
> ...
>
> /* Update resource usage integrals and maximums. */
> MPASS(p->p_vmspace != NULL);
> vm = p->p_vmspace;
> ru = &td->td_ru;
> ru->ru_ixrss += pgtok(vm->vm_tsize) * cnt;
> ru->ru_idrss += pgtok(vm->vm_dsize) * cnt;
> ru->ru_isrss += pgtok(vm->vm_ssize) * cnt;
> rss = pgtok(vmspace_resident_count(vm));
> if (ru->ru_maxrss < rss)
> ru->ru_maxrss = rss;
>
> ...
> }
i can try and find the right place for this.
> Note that we have code in usr.bin/time/time.c that prints these and
> divides by a number of ticks to calculate average values. So changing
> these to current values is just wrong.
if only there was some way to change code that used these fields...
> If we would want to report current values, we should add new members
> to the struct. But I don't think getrusage(2) should be (ab)used to
> report current values.
technically the current memory used values are aggregations. they're
the sum of all previous allocations less the sum of all memory that
was returned to the kernel. it just happens to make sense as an
instantaneous value, unlike most of the other things in rusage which
need some kind of derivation.
i think the diff below implements the sampling and accumulation of rss
values in statclock like freebsd. uvm currently measures maxrss, but it
would be simple to pull it into statclock too if we want.
is the kern_exit.c chunk correct?
i havent tried collecting these these "integral" values periodically
within a program or try to make meaningful graphs out of them, so i
can't argue about whether more info in getrusage is necessary or if i
can make something comprehensible out of these "proper" measurements.
Index: sys/proc.h
===================================================================
RCS file: /cvs/src/sys/sys/proc.h,v
diff -u -p -r1.376 proc.h
--- sys/proc.h 22 Oct 2024 11:54:05 -0000 1.376
+++ sys/proc.h 8 Dec 2024 12:09:11 -0000
@@ -95,11 +95,20 @@ struct pgrp {
* generation counter. Code should use tu_enter() and tu_leave() for this.
* The process ps_tu structure is locked by the ps_mtx.
*/
+#define TU_UTICKS 0 /* Statclock hits in user mode. */
+#define TU_STICKS 1 /* Statclock hits in system mode. */
+#define TU_ITICKS 2 /* Statclock hits processing intr. */
+#define TU_TICKS_COUNT 3
+
struct tusage {
uint64_t tu_gen; /* generation counter */
- uint64_t tu_uticks; /* Statclock hits in user mode. */
- uint64_t tu_sticks; /* Statclock hits in system mode. */
- uint64_t tu_iticks; /* Statclock hits processing intr. */
+ uint64_t tu_ticks[TU_TICKS_COUNT];
+#define tu_uticks tu_ticks[TU_UTICKS]
+#define tu_sticks tu_ticks[TU_STICKS]
+#define tu_iticks tu_ticks[TU_ITICKS]
+ uint64_t tu_ixrss;
+ uint64_t tu_idrss;
+ uint64_t tu_isrss;
struct timespec tu_runtime; /* Realtime. */
};
Index: kern/kern_clock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_clock.c,v
diff -u -p -r1.124 kern_clock.c
--- kern/kern_clock.c 8 Jul 2024 13:17:11 -0000 1.124
+++ kern/kern_clock.c 8 Dec 2024 12:09:11 -0000
@@ -49,6 +49,7 @@
#include <sys/sysctl.h>
#include <sys/sched.h>
#include <sys/timetc.h>
+#include <uvm/uvm_extern.h>
/*
* Clock handling routines.
@@ -267,6 +268,8 @@ statclock(struct clockrequest *cr, void
struct schedstate_percpu *spc = &ci->ci_schedstate;
struct proc *p = curproc;
struct process *pr;
+ int tu_tick = -1;
+ int cp_time;
if (statclock_is_randomized) {
count = clockrequest_advance_random(cr, statclock_min,
@@ -281,13 +284,8 @@ statclock(struct clockrequest *cr, void
* Came from user mode; CPU was in user state.
* If this process is being profiled record the tick.
*/
- tu_enter(&p->p_tu);
- p->p_tu.tu_uticks += count;
- tu_leave(&p->p_tu);
- if (pr->ps_nice > NZERO)
- spc->spc_cp_time[CP_NICE] += count;
- else
- spc->spc_cp_time[CP_USER] += count;
+ tu_tick = TU_UTICKS;
+ cp_time = (pr->ps_nice > NZERO) ? CP_NICE : CP_USER;
} else {
/*
* Came from kernel mode, so we were:
@@ -303,26 +301,39 @@ statclock(struct clockrequest *cr, void
* in ``non-process'' (i.e., interrupt) work.
*/
if (CLKF_INTR(frame)) {
- if (p != NULL) {
- tu_enter(&p->p_tu);
- p->p_tu.tu_iticks += count;
- tu_leave(&p->p_tu);
- }
- spc->spc_cp_time[spc->spc_spinning ?
- CP_SPIN : CP_INTR] += count;
+ tu_tick = TU_ITICKS;
+ cp_time = CP_INTR;
} else if (p != NULL && p != spc->spc_idleproc) {
- tu_enter(&p->p_tu);
- p->p_tu.tu_sticks += count;
- tu_leave(&p->p_tu);
- spc->spc_cp_time[spc->spc_spinning ?
- CP_SPIN : CP_SYS] += count;
+ tu_tick = TU_STICKS;
+ cp_time = CP_SYS;
} else
- spc->spc_cp_time[spc->spc_spinning ?
- CP_SPIN : CP_IDLE] += count;
+ cp_time = CP_IDLE;
+
+ if (spc->spc_spinning)
+ cp_time = CP_SPIN;
}
+ spc->spc_cp_time[cp_time] += count;
+
if (p != NULL) {
p->p_cpticks += count;
+
+ if (!ISSET(p->p_flag, P_SYSTEM) && tu_tick != -1) {
+ struct vmspace *vm = p->p_vmspace;
+ struct tusage *tu = &p->p_tu;
+
+ tu_enter(tu);
+ tu->tu_ticks[tu_tick] += count;
+
+ /* maxrss is handled by uvm */
+ if (tu_tick != TU_ITICKS) {
+ tu->tu_ixrss += (vm->vm_tsize << (PAGE_SHIFT - 10)) * count;
+ tu->tu_idrss += (vm->vm_dused << (PAGE_SHIFT - 10)) * count;
+ tu->tu_isrss += (vm->vm_ssize << (PAGE_SHIFT - 10)) * count;
+ }
+ tu_leave(tu);
+ }
+
/*
* schedclock() runs every fourth statclock().
*/
Index: kern/kern_exit.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_exit.c,v
diff -u -p -r1.239 kern_exit.c
--- kern/kern_exit.c 15 Oct 2024 13:49:26 -0000 1.239
+++ kern/kern_exit.c 8 Dec 2024 12:09:11 -0000
@@ -346,6 +346,9 @@ exit1(struct proc *p, int xexit, int xsi
* and calculate the total times.
*/
calcru(&pr->ps_tu, &rup->ru_utime, &rup->ru_stime, NULL);
+ rup->ru_ixrss = pr->ps_tu.tu_ixrss;
+ rup->ru_idrss = pr->ps_tu.tu_idrss;
+ rup->ru_isrss = pr->ps_tu.tu_isrss;
ruadd(rup, &pr->ps_cru);
/*
Index: kern/kern_resource.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_resource.c,v
diff -u -p -r1.93 kern_resource.c
--- kern/kern_resource.c 10 Nov 2024 06:45:36 -0000 1.93
+++ kern/kern_resource.c 8 Dec 2024 12:09:11 -0000
@@ -395,6 +395,9 @@ tuagg_sumup(struct tusage *tu, const str
tu->tu_uticks += tmp.tu_uticks;
tu->tu_sticks += tmp.tu_sticks;
tu->tu_iticks += tmp.tu_iticks;
+ tu->tu_ixrss += tmp.tu_ixrss;
+ tu->tu_idrss += tmp.tu_idrss;
+ tu->tu_isrss += tmp.tu_isrss;
timespecadd(&tu->tu_runtime, &tmp.tu_runtime, &tu->tu_runtime);
}
@@ -440,6 +443,7 @@ tuagg_add_process(struct process *pr, st
/* Now reset CPU time usage for the thread. */
timespecclear(&p->p_tu.tu_runtime);
p->p_tu.tu_uticks = p->p_tu.tu_sticks = p->p_tu.tu_iticks = 0;
+ p->p_tu.tu_ixrss = p->p_tu.tu_idrss = p->p_tu.tu_isrss = 0;
}
void
@@ -567,6 +571,10 @@ dogetrusage(struct proc *p, int who, str
}
calcru(&tu, &rup->ru_utime, &rup->ru_stime, NULL);
+
+ rup->ru_ixrss = tu.tu_ixrss;
+ rup->ru_idrss = tu.tu_idrss;
+ rup->ru_isrss = tu.tu_isrss;
break;
case RUSAGE_THREAD:
getrusage() memory stats