Download raw body.
SYS_pinsyscalls question
> From: Mateusz Guzik <mjguzik@gmail.com>
> Date: Sun, 23 Feb 2025 23:54:50 +0100
>
> On Sun, Feb 23, 2025 at 8:50 PM Theo de Raadt <deraadt@openbsd.org> wrote:
> >
> > Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > > On Sun, Feb 23, 2025 at 5:21 PM Theo de Raadt <deraadt@openbsd.org> wrote:
> > > >
> > > > > Then this copyin:
> > > > > error = copyin(SCARG(uap, pins), pins, npins * sizeof(u_int));
> > > > > if (error)
> > > > > goto err;
> > > > >
> > > > > will go off cpu and drop the kernel lock, allowing other threads to
> > > > > reach the same point, all still observing pr->ps_libcpin.pn_start == 0.
> > > > >
> > > > > Then several threads will reach this assignment:
> > > > > pr->ps_libcpin.pn_start = base;
> > > > >
> > > > > is going to override all previously set bufs, leaking them.
> > > >
> > > > Yes, would need very custom crt0 in those static binaries.
> > > >
> > >
> > > Presumably someone wanting to mess with the kernel would go there.
> > >
> > > > This diff should allow the first thread to do pinsyscalls to win, and
> > > > re-check the condition post-sleep so that others lose and free their
> > > > pintable. EPERM matches the existing spec in the man page.
> > > >
> > >
> > > Another angle is what happens when one thread is messing about with
> > > installing the table, while another is issuing syscalls. Note there is
> > > no synchronization between this and pin_check. It may be this does not
> > > blow up in practice at the moment, but I don't think depending on it
> > > is viable.
> >
> > That is not a concern. system calls which start before pinsyscalls
> > completes setup, work as expected.
> >
>
> Even if that's considered fine, there is a transition period where the
> table is just being installed. The code needs correct memory fences on
> both sides to make sure this works as expected.
Well, you're right that there is a potential race between threads
here. Although that can only happen if you write your own version of
ld.so that creates threads before pinning down the syscalls using
pinsyscalls(2).
The pinsyscals(2) code does
pr->ps_libcpin.pn_start = base;
pr->ps_libcpin.pn_end = base + len;
pr->ps_libcpin.pn_pins = pins;
pr->ps_libcpin.pn_npins = npins;
but another thread may observe those assignments in a different order
because there are no (memory) barriers. Fortunately the code that
checks the libc.so syscall range explicitly checks pn_pins, so there
is no risk of a kernel null-pointer dereference. But it does mean
that the address or syscall number checks in pin_check() are done in a
somewhat inconsistent way.
I think the worst case is when the thread sees the correct pn_end,
pn_pins and pn_npins, but pn_start is still zero. That could
potentially permit a syscall that shouldn't be allowed.
The diff below would fix the potential races. It adds a memory
barrier that makes sure pn_start, pn_end and pn_npins become visible
before pn_pins. Then in pin_check() we also need a memory barrier to
make sure pn_pins gets loaded before the other variable.
This shouldn't have a significant impact on performance. The
membar_consumer() barrier is probably the cheapest of the bunch and on
amd64 it is implemented as a no-op anyway.
But at the same time, this is a solution for a problem that doesn't
really exist in reality. If someone really writes an ld.so
replacement that starts threads beforing calling pinsyscalls(2) they
deserve what they get.
Index: sys/syscall_mi.h
===================================================================
RCS file: /cvs/src/sys/sys/syscall_mi.h,v
diff -u -p -r1.37 syscall_mi.h
--- sys/syscall_mi.h 27 Dec 2024 11:57:16 -0000 1.37
+++ sys/syscall_mi.h 2 Mar 2025 21:07:00 -0000
@@ -57,6 +57,7 @@ pin_check(struct proc *p, register_t cod
extern char sigcodecall[], sigcoderet[], sigcodecall[];
struct pinsyscall *pin = NULL, *ppin, *plibcpin;
struct process *pr = p->p_p;
+ u_int *libcpins;
vaddr_t addr;
int error = 0;
@@ -73,7 +74,9 @@ pin_check(struct proc *p, register_t cod
* 2b) dynamic binary: syscalls in ld.so (in the ps_pin region)
* 3) sigtramp, containing only sigreturn(2)
*/
- if (plibcpin->pn_pins &&
+ libcpins = READ_ONCE(plibcpin->pn_pins);
+ membar_consumer();
+ if (libcpins &&
addr >= plibcpin->pn_start && addr < plibcpin->pn_end)
pin = plibcpin;
else if (ppin->pn_pins &&
Index: uvm/uvm_mmap.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_mmap.c,v
diff -u -p -r1.197 uvm_mmap.c
--- uvm/uvm_mmap.c 28 Feb 2025 06:26:18 -0000 1.197
+++ uvm/uvm_mmap.c 2 Mar 2025 21:07:00 -0000
@@ -653,8 +653,9 @@ err:
}
pr->ps_libcpin.pn_start = base;
pr->ps_libcpin.pn_end = base + len;
- pr->ps_libcpin.pn_pins = pins;
pr->ps_libcpin.pn_npins = npins;
+ membar_producer();
+ WRITE_ONCE(pr->ps_libcpin.pn_pins, pins);
#ifdef PMAP_CHECK_COPYIN
/* Assume (and insist) on libc.so text being execute-only */
SYS_pinsyscalls question