Index | Thread | Search

From:
Mark Kettenis <mark.kettenis@xs4all.nl>
Subject:
Re: Fix nested copy(9) for btrace's ustack
To:
Martin Pieuchot <mpi@grenadille.net>
Cc:
deraadt@openbsd.org, cludwig@genua.de, tech@openbsd.org
Date:
Mon, 10 Mar 2025 21:29:27 +0100

Download raw body.

Thread
> Date: Mon, 10 Mar 2025 20:52:23 +0100
> From: Martin Pieuchot <mpi@grenadille.net>
> 
> On 07/03/25(Fri) 09:14, Theo de Raadt wrote:
> > It seems like a huge change in behaviour to have nested operations here.
> > 
> > I'm suspicious that it only requires these two changes, and anticipate
> > the entire mechanism can't be moved from singular to nested with just
> > this.  That prompts the questions:  How did we get here, and is this
> > viable recursion???  I strongly suspect no.
> 
> I believe we get there because some ddb/dtrace code is now relying on
> copyin(9) which overrides `pcb_onfault'.  I don't think using copyin(9)
> inside ddb/dtrace code is viable.  We cannot inspect the kernel it we
> rely on its APIs.

The problem is the copyin(9) call in stacktrace_save_utrace() isn't
it?  That is actually about inspecting userland, not the kernel.

Anyway, the problem here really is calling copyin(9) from interrupt
context; we never designed the kernel to allow that.  You half-fixed
the problem by setting ci_inatomic, but that isn't the whole story.
So yes I agree that using copyin(9) in this context is wrong.

> Christian, could we use a ddb/dtrace version of copyin(9) that would not
> mess with `pcb_onfault'?

There are two ways to fix this I think:

* Check pcb_onfault and skip the copyin(9) call if it is already set.

* Save pcb_onfault before the copyin(9) call and restore it
  afterwards.  Probably need to set it to NULL before the copyin(9)
  call.

The last option is probably better since otherwise userland backtraces
will terminate early whenever a system call is doing a copyin(9),
which would create a significant blindspot.

Probably should create a wrapper and maybe call _copyin() instead of
copyin() in that wrapper.  Best to move the ci_inatomic frobbing in
there as well.