From: Philip Guenther <guenther@gmail.com>
Subject: Re: amd64/fpu: Avoid multiple FPU resets
To: Mark Kettenis <mark.kettenis@xs4all.nl>
Cc: deraadt@openbsd.org, cludwig@genua.de, tech@openbsd.org
Date: Sun, 15 Jun 2025 09:07:07 -0700

On Sun, Jun 15, 2025 at 1:24 AM Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
>
> > From: Philip Guenther <guenther@gmail.com>
> > Date: Sat, 14 Jun 2025 21:27:20 -0700
>
> Hi Philip,
>
> > On Fri, Jun 13, 2025 at 5:49 PM Theo de Raadt <deraadt@openbsd.org> wrote:
> > >
> > > Theo de Raadt <deraadt@openbsd.org> wrote:
> > >
> > > > Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
> > > >
> > > > > > From: Philip Guenther <guenther@gmail.com>
> > > > > > Date: Fri, 13 Jun 2025 11:37:03 -0700
> > > > > >
> > > > > > On Fri, Jun 13, 2025 at 5:32 AM Christian Ludwig <cludwig@genua.de> wrote:
> > > > > > > All kernel crypto code follows the scheme:
> > > > > > >
> > > > > > >         for (objects) {
> > > > > > >                 fpu_kernel_enter();
> > > > > > >                 ...
> > > > > > >                 fpu_kernel_exit();
> > > > > > >         }
> > > > > > >
> > > > > > > In every iteration, fpu_kernel_exit() resets the FPU state and
> > > > > > > fpu_kernel_enter() resets it, again. FPU resets are expensive on some
> > > > > > > platforms. Doing the operation twice per loop iteration is clearly not
> > > > > > > necessary.
> > > > > > >
> > > > > > > The FPU is always in one of two states when we reach fpu_kernel_enter().
> > > > > > > It either holds user state when CPUPF_USERXSTATE is set or it is in
> > > > > > > reset state already. The context switching code and signal code follow
> > > > > > > this assumption, too. So we can simply drop resetting the FPU in
> > > > > > > fpu_kernel_enter() when it does not hold user state.
> > > > > >
> > > > > > Hmm, yes, all the places that clear CPUF_USERXSTATE reset the state.
> > > > > >
> > > > > > Does fpu_kernel_enter() get used from interrupt context?  Do we have
> > > > > > to worry about an interrupt occurring between the clearing of the flag
> > > > > > and the resetting of the state?
> > > > >
> > > > > I'm pretty sure the intention was that you can onlu use
> > > > > fpu_kernel_enter() from process context.  But I don't think we
> > > > > documented this anywhere and we don't enforce this.  Should we stick
> > > > > an assertwaitok() in there?
> > > >
> > > > My point is has this covered pu some bugs.
> > >
> > >      My point is has this covered up some bugs? [that is a question]
> >
> > I suspect there's a very small possibility of an amdgpu interrupt
> > occurring during critical sections in sendsig() or sys_sigreturn()
> > resulting in the user thread's FPU space being reset unexpectedly, and
> > a slightly larger possibility of such an interrupt occurring during
> > any kernel process-context FPU usage (IPsec?  Softraid?) corrupting
> > that kernel computation.
> >
> > Anyone using IPsec or softraid on their amdgpu system running X?
> > Experience random IPsec or softraid corruption?
> >
> >
> > So, given two goals: (1) permit safe fpu use by one-level of
> > interrupts (amdgpu), and (2) avoid unnecessary fpuresets (what started
> > this thread), I thinking we need 4 states:
> > 1) has user xstate (the current CPUF_USERXSTATE flag): CPU registers
> > are master copy of the user thread's xstate
> > 2) unknown xstate: FPU may be dirty or still have weird mode and
> > exception bits set by userspace
> > 3) has safe, but unused xstate: fpureset() has been used since last in userspace
> > 4) has kernel-proc xstate: CPU register are master copy of a kernel
> > process-context operation
> >
> > and then an additional bit to hold "already in an interrupt using the
> > FPU: panic if a nested interrupt tries to use the FPU (no where to
> > save the xstate!)", possibly in the form of a const char* pointer to
> > the __FILE__  of the outer interrupt, so we can see who's interrupting
> > whom.
> >
> > fpu_kernel_enter() would
> >  * assertwaitok()
> >  * if state:
> >    1) fpusave, set state 2, fpureset(), set state 4
> >    2) fpureset(), set state 4
> >    3) set state 4
> >    4) panic: recursive process-content FPU usage
> >
> > fpu_kernel_exit() would
> >  * assert state 4
> >  * set state 3
> >
> > New fpu_kernel_enter_intr(file = __FILE__) would:
> >  * assert curcpu()->ci_intr_fpu == NULL
> >  * curcpu()->ci_intr_fpu = file
> >  * if state:
> >    1) fpusave, fpureset(), set state 3
> >    2) fpureset(), set state 3
> >    3) do nothing
> >    4) fpusave()
> >
> > New fpu_kernel_exit_intr() would:
> >  * assert curcpu()->ci_intr_fpu != NULL
> >  * if state:
> >    1 or 2) panic
> >    3) do nothing
> >    4) xstor
> >  * curcpu()->ci_intr_fpu = NULL;
> >
> >
> > And everywhere clearing USERXSTATE needs to fpusave() _first_, then
> > clear it (that's the 1->2 transition), and only then possibly do
> > fpureset() to get to state 3 if it needs it (sendsig(), for example)
> >
> >
> > Does that make sense?  Should I push around a full diff or is there a
> > clearly wrong understanding of the kernel FPU usages (use from nested
> > interrupts?) that mean this is not sufficient?
>
> That sounds like a lot of complexity.

Handling both process and interrupt contexts would add some, of course.


> The idea that the kernel should only be allowed to use the FPU in
> process context isn't an arbitrary choice.  The Linux kernel has the
> same rule:
>
>   https://docs.kernel.org/core-api/floating-point.html#runtime-api
>
> Therefore I don't expect the amdgpu code to be using the FPU from
> interrupt context.  That would be a bug.

Okay, then I think the original diff is correct, though maybe we add
the assertwaitok() to fpu_kernel_enter() so we remain confident of
that.

Since the kernel's FPU usage is relatively limited in registers it
touches, there would still be a potential gain to have
fpu_kernel_exit() only clear the FPU registers we use, if that's
cheaper than fpureset(), perhaps VZEROALL?  For paranoia we could have
a bit to track whether the kernel used the FPU so a full reset could
be done in paths like sendsig()/setexec() that want to guarantee a
fully clean FPU state before setting CPUF_USERXSTATE.


Philip Guenther