Download raw body.
Theo de Raadt <deraadt@openbsd.org> wrote:
> Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> > On Sun, Feb 23, 2025 at 5:21 PM Theo de Raadt <deraadt@openbsd.org> wrote:
> > >
> > > > Then this copyin:
> > > > error = copyin(SCARG(uap, pins), pins, npins * sizeof(u_int));
> > > > if (error)
> > > > goto err;
> > > >
> > > > will go off cpu and drop the kernel lock, allowing other threads to
> > > > reach the same point, all still observing pr->ps_libcpin.pn_start == 0.
> > > >
> > > > Then several threads will reach this assignment:
> > > > pr->ps_libcpin.pn_start = base;
> > > >
> > > > is going to override all previously set bufs, leaking them.
> > >
> > > Yes, would need very custom crt0 in those static binaries.
> > >
> >
> > Presumably someone wanting to mess with the kernel would go there.
> >
> > > This diff should allow the first thread to do pinsyscalls to win, and
> > > re-check the condition post-sleep so that others lose and free their
> > > pintable. EPERM matches the existing spec in the man page.
> > >
> >
> > Another angle is what happens when one thread is messing about with
> > installing the table, while another is issuing syscalls. Note there is
> > no synchronization between this and pin_check. It may be this does not
> > blow up in practice at the moment, but I don't think depending on it
> > is viable.
>
> That is not a concern. system calls which start before pinsyscalls
> completes setup, work as expected.
To be more clear, this strategy was recently taken with pledge:
CVSROOT: /cvs
Module name: src
Changes by: deraadt@cvs.openbsd.org 2025/02/10 09:45:46
Modified files:
sys/sys : proc.h
sys/kern : kern_event.c kern_pledge.c
sys/dev/vmm : vmm.c
Log message:
A syzkaller report was diagnosed by semarie, and found a namei-related
sleeping system call which was re-inspecting p->p_p->ps_pledge in one thread,
after another thread had reduced the promises by calling pledge(), with
promises which would have prevented that syscall from being called in
the first place. This inconsistant promise view is dangerous. So let's
change pledge semantics a tiny bit: We copy the per-process p_p->ps_pledge
value to per-thread p_pledge at invocation of each system call, so that the
configuration is stable.
This method avoids increasing the cost of pledge checks.
ok claudio kettenis semarie
Basically, sys_pledge() sets process flags very late. A system call
performed in a different process before that completion, is operating
under the old rules, and allowed to enter the kernel and will then run
to completion.
It should be the same with pinsyscalls. Any system call entry attempted
by a different thread before pinsyscalls() completes, will succeed.
That's fine. There is no interlock needed, because we are not trying
to solve time travel.