From: Martin Pieuchot Subject: Re: powersave CPU policy To: tech@openbsd.org Date: Tue, 18 Jun 2024 11:11:32 +0200 On 15/06/24(Sat) 15:53, Kirill A. Korinsky wrote: > Hi, > > Thanks for good questions, it took some time to answer. > > I've inlined an updated version of the patch in this email, which applied > correctly to the latest version from the git mirror > (e90eceb0db08f80f46bf1fa7804bb20dcd9c55eb), which matches today's snapshot. > > On Wed, 12 Jun 2024 11:34:17 +0100, > Martin Pieuchot wrote: > > > > On 03/06/24(Mon) 21:28, Kirill A. Korinsky wrote: > > > On Mon, 03 Jun 2024 17:44:27 +0100, > > > Mark Kettenis wrote: > > > > > > > > So my take on this is: not now! I'm in the same area to improve the > > > > suspend-to-idle code that was recently committed. > > > > > > > > Also, the diff does many things that should probably be separate > > > > changes. Also, this may work for your machine, and for your use case, > > > > but it needs to work for everybody and across architectures. > > > > > > > > And while the idea of turning off cores that aren't used, in the end > > > > this depends on the scheduler to give you the right information. And > > > > there are plans to make some drastic changes in that area as well. > > > > > > > > If I understand your diff correctly it is about dynamically removing a > > bunch of CPUs from the scheduler in order to place them in the deepest > > sleep state? > > > > Is your work based on some previous state of the art? How did you chose > > the number in your algorithm? > > > > This work is based on Solene's patch, which she announced at > https://marc.info/?l=openbsd-tech&m=163259444331471&w=2 , which she > benchmarked, and the results of which she published on her blog > https://dataswamp.org/~solene/2021-09-26-openbsd-power-usage.html . Thanks. I wasn't aware of this. To me this is a good PoC that something should be done. However this isn't something that I'd like to be committed as it is. This change duplicates a piece of code which is already black magic and not well adapted. So please let's try to improve/fix it instead! > With two changes: > > 1. Unused CPU cores are halted and moved to deepest C-state to save some > power. The hard question here is how do we defined "unused". I know that your diff use `CP_IDLE' but I'd like to be sure there isn't a better alternative and in 2024 I believe there is already prior works we can learn from. See the problem with CP_IDLE is that it is another subsystem which is "guessing" what the CPU usage is. So, yes your joint work with solene@ might be an improvement over what is currently in tree, sadly it is based on the same bias. I believe we can do better. > 2. It uses powersavelimit settings instead of 100 as active CPU performance. > > The (2) is based on the observation that 100 means "turbo mode", which means > bigger power consumption, and 75, which is default setting, is something > that matches reported CPU frequency by apm with what Intel uses in CPU name. > > Thus, dedicated CPU policy allows to tune it in the future, with limited > impact only to people who use it. This is the second hard question which depends on the machine (laptop vs desktop), its usage, the generation of CPU, the kind of CPU (big vs LITTLE), its architecture... We cannot have a subsystem "designed" only for modernish x86 laptops with turbo mode CPUs. Plus we cannot add another sysctl like that. I understand why you did it. Sadly this is a workaround for the old hw.setperf interface. I doubt this interface is still relevant in 2024 when it comes to improving power consumption. > > I have a related question. acpicpu_idle() currently checks if a CPU is > > idle via cpu_is_idle(). If the CPU is removed from the scheduler, this > > will always be true. I'm annoyed with these checks because they rely on > > distributing runnable threads to CPUs in advance. Do you have an idea > > on how to remove those checks from the various *idle() routines without > > compromising reactivity? > > I see that acpicpu_suspend was introduced last week, which allows me to > avoid changes inside acpicpu and move the switch inside scheduler to use > cpu_suspend_cycle_fcn on arch that supports it. Which seems much safer. I don't understand why it seems safer. My point is that the IDLE thread is designed like the following loop: while (cpu_has_nothing_enqueued(ci)) cpu_do_nothing_and_save_energie(ci); This is bad for what you want to achieve and for what I want to achieve. Instead I'd rather see a loop like: while (cpu_is_not_needed(ci)) cpu_do_nothing_and_save_energie(ci); So the new question is when do we need a CPU and at which frequency do we need it? To me this is something that has to be designed from scratch based on existing state of art. Building something on top of the hw.setperf mechanism is not going work.