Index | Thread | Search

From:
Kirill A. Korinsky <kirill@korins.ky>
Subject:
Re: powersave CPU policy
To:
tech@openbsd.org
Date:
Tue, 18 Jun 2024 13:05:25 +0100

Download raw body.

Thread
Hi,

On Tue, 18 Jun 2024 10:11:32 +0100,
Martin Pieuchot <mpi@openbsd.org> wrote:
>

[...]

> > With two changes:
> > 
> > 1. Unused CPU cores are halted and moved to deepest C-state to save some
> >    power.
> 
> The hard question here is how do we defined "unused".  I know that your
> diff use `CP_IDLE' but I'd like to be sure there isn't a better
> alternative and in 2024 I believe there is already prior works we can
> learn from.
>
> See the problem with CP_IDLE is that it is another subsystem which is
> "guessing" what the CPU usage is.  So, yes your joint work with solene@
> might be an improvement over what is currently in tree, sadly it is
> based on the same bias.  I believe we can do better.

And the next question is what to do with proc that is restricted to a
specific CPU core?

Frankly, this question kills the part about halting CPU cores. Handling it
makes things much more complicated.

So I've abandoned this approach as too complicated and dangerous.

> This is the second hard question which depends on the machine (laptop vs
> desktop), its usage, the generation of CPU, the kind of CPU (big vs
> LITTLE), its architecture...  We cannot have a subsystem "designed" only
> for modernish x86 laptops with turbo mode CPUs.
> 
> Plus we cannot add another sysctl like that.  I understand why you did it.
> Sadly this is a workaround for the old hw.setperf interface.  I doubt this
> interface is still relevant in 2024 when it comes to improving power
> consumption.
>

Just to put on the table the issue with the current auto mode. It switched
to 100% cpu usage when connected to the AC-adapter. A machine I am using
right now had an interesting bug: at 100% cpu usage it needs more power than
it can get from AC-adapter and it drains the battery a bit as a result. Yes,
it doesn't charge on AC adapter with default auto policy. The machine has
Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz

> I don't understand why it seems safer.  My point is that the IDLE thread
> is designed like the following loop:
> 
> 	while (cpu_has_nothing_enqueued(ci))
> 		cpu_do_nothing_and_save_energie(ci);
> 
> This is bad for what you want to achieve and for what I want to achieve.
> Instead I'd rather see a loop like:
> 
> 	while (cpu_is_not_needed(ci))
> 		cpu_do_nothing_and_save_energie(ci);
> 
> So the new question is when do we need a CPU and at which frequency do
> we need it?
> 
> To me this is something that has to be designed from scratch based on
> existing state of art.  Building something on top of the hw.setperf
> mechanism is not going work.
> 

I agree, and would like to share a naive implementation of a Linux's
ondemand like policy. I'm not suggesting to commit this diff, it's more like
another proof of concept which I'd like to share.

I used the same spc_cp_time because it's here and allows to implement such a
PoC with minimal changes.

Interesting that on my machine it works as long as the previous PoC, but
this time without any impact on system responsiveness.

The next logical step to this PoC is to change the frequency per CPU, which
will require some code rework.

diff --git sys/kern/sched_bsd.c sys/kern/sched_bsd.c
index 54ce3173e4e..88edece0e1f 100644
--- sys/kern/sched_bsd.c
+++ sys/kern/sched_bsd.c
@@ -580,12 +580,11 @@ void
 setperf_auto(void *v)
 {
 	static uint64_t *idleticks, *totalticks;
-	static int downbeats;
 	int i, j = 0;
-	int speedup = 0;
+	int boost = 0, load = 0, speed = 0;
 	CPU_INFO_ITERATOR cii;
 	struct cpu_info *ci;
-	uint64_t idle, total, allidle = 0, alltotal = 0;
+	uint64_t idle, total;
 
 	if (perfpolicy != PERFPOL_AUTO)
 		return;
@@ -593,11 +592,6 @@ setperf_auto(void *v)
 	if (cpu_setperf == NULL)
 		return;
 
-	if (hw_power) {
-		speedup = 1;
-		goto faster;
-	}
-
 	if (!idleticks)
 		if (!(idleticks = mallocarray(ncpusfound, sizeof(*idleticks),
 		    M_DEVBUF, M_NOWAIT | M_ZERO)))
@@ -607,36 +601,48 @@ setperf_auto(void *v)
 		    M_DEVBUF, M_NOWAIT | M_ZERO))) {
 			free(idleticks, M_DEVBUF,
 			    sizeof(*idleticks) * ncpusfound);
+			idleticks = NULL;
 			return;
 		}
+
 	CPU_INFO_FOREACH(cii, ci) {
 		if (!cpu_is_online(ci))
 			continue;
+
 		total = 0;
-		for (i = 0; i < CPUSTATES; i++) {
+		for (i = 0; i < CPUSTATES; i++)
 			total += ci->ci_schedstate.spc_cp_time[i];
-		}
 		total -= totalticks[j];
+		if (total <= 0)
+			continue;
+
 		idle = ci->ci_schedstate.spc_cp_time[CP_IDLE] - idleticks[j];
-		if (idle < total / 3)
-			speedup = 1;
-		alltotal += total;
-		allidle += idle;
+
 		idleticks[j] += idle;
 		totalticks[j] += total;
+
+		load = (100 * (total - idle)) / total;
+
+		if (load > 80)
+			boost = 1;
+		else if (load > speed)
+			speed = load;
+
 		j++;
 	}
-	if (allidle < alltotal / 2)
-		speedup = 1;
-	if (speedup && downbeats < 5)
-		downbeats++;
 
-	if (speedup && perflevel != 100) {
-faster:
+	if (boost && perflevel != 100) {
 		perflevel = 100;
 		cpu_setperf(perflevel);
-	} else if (!speedup && perflevel != 0 && --downbeats <= 0) {
-		perflevel = 0;
+	} else if (!boost && speed > perflevel && perflevel != 100) {
+		perflevel += 5;
+		if (perflevel > 100)
+		    perflevel = 100;
+		cpu_setperf(perflevel);
+	} else if (!boost && speed < perflevel && perflevel != 0) {
+		perflevel -= 5;
+		if (perflevel < 0)
+		    perflevel = 0;
 		cpu_setperf(perflevel);
 	}
 


-- 
wbr, Kirill