Mailing List Archive

On Mon, Feb 17, 2025 at 03:40:28PM +0100, Martin Pieuchot wrote: > On 10/02/25(Mon) 13:02, Claudio Jeker wrote: > > On Mon, Feb 10, 2025 at 12:32:52PM +0100, Martin Pieuchot wrote: > > > On 07/02/25(Fri) 17:47, Claudio Jeker wrote: > > > > On Thu, Feb 06, 2025 at 06:46:52PM +0100, Martin Pieuchot wrote: > > > > > On 06/02/25(Thu) 09:43, Theo de Raadt wrote: > > > > > > Mark Kettenis <mark.kettenis@xs4all.nl> wrote: > > > > > > > > > > > > > > From: "Theo de Raadt" <deraadt@openbsd.org> > > > > > > > > Date: Thu, 06 Feb 2025 09:17:52 -0700 > > > > > > > > > > > > > > > > > [2] in another thread, pledge("stdio rpath wpath"), and returns. > > > > > > > > > the process is now pledged. > > > > > > > > > > > > > > > > How can this be allowed? > > > > > > > > > > > > > > > > I am pretty sure sys_pledge should single-thread the process, which > > > > > > > > means it will wait until other threads complete their in-kernel sleeps. > > > > > > > > > > > > > > I'm not sure clauio@ will agree with you ;) > > > > > > > > > > > > He just agreed with me privately. > > > > > > > > > > I'd rather see a rwlock be used to serialized access to the per-process > > > > > data structures. I don't see any reason to use the single thread API > > > > > for this and I'd rather not spread its usage. It is already a pain to > > > > > work with. > > > > > > > > We need to fix that and this is what I'm slowly working at. The single > > > > thread API should be simple and allow a few syscalls to know that no other > > > > thread will disturb them. exec and exit are such cases but I think unveil > > > > and pledge fall into a similar category. They alter the behaviour of the > > > > process and should therefor change the sate in a safe fashion. > > > > > > I disagree with this approach. > > > > I desagree with you on this one. The single thread API is one of the tools > > we have to make MP/MT code safe. In this case this tool fits well and solves > > an immediate problem. Hammering an rwlock around all of unveil is not > > practicable on the other hand. > > The single thread API is IMHO not yet a tool. It is an extremely fragile > subsystem which relies on a complex mutex & reference counting of > threads plus internal state inspection that rely on the SCHED_LOCK() and > that we do not want to spread. I very much disagree. The API is a very important tool. We need it for exec and exit so it is heavily used. It is not fragile, at least not anymore and the fragile bits where exclusive to SINGLE_SUSPEND which is not used there for a good reason. I spent a lot of time on fixing this. > From performance point of view it is the worst and from far. Depends your view point. It is expensive to take a multithreaded process to single thread execution but only if the threads are all running. For a single threaded process the API is free. Unveil(2) is a system call that is best used early in a program when threads are not yet spawned or are sleeping. Proper use of unveil(2) requires to lock down the process with unveil(NULL, NULL) at that point the unveil data becomes immutable. So in the unveil(2) case a lot speaks for it. > It also interacts with signals and states that are (ab)used by ptrace. Plus > it is a subsystem that is under heavy development with multiple bugs and > locking dependencies that we haven't solved yet. On top of that it is very > complicated and barely understood by very few over-busy kernel developers. Again this is is no longer true. I spent the last months to make this work in all cases. Also the API will become a lot simpler since SINGLE_SUSPEND will die soon. > Finally it has no integration in WITNESS nor way to assert that data access > are properly "protected". Yes, there is not WITNESS for this. > > Hammering an rwlock around all of unveil is not > > practicable on the other hand. > > This is completely false. unveil is a relatively small subsystems > composed of 3 fields shared amount sibling threads in a process. It is > not only practicable, it is doable starting with the write path > unveil_add(). > > Plus the subsystem already use rwlock... > > I also believe this is a great place to start building knowledge about > the limits of the VFS in term of data owners. > > I also believe the complexity of unveil is small enough that a kernel > developer not experienced in locking can tackle it with some help. I was maybe a bit unclear. I said once that using the single_thread API here is a viable solution until something better is developped. Right now I think preventing the race via single_thread API is a viable solution until somebody does the work to clean up unveil and properly unlock it. It is easy to say, the data structures are easy to lock but we did not look at the call sites and consider if an extra sleep point there would result in other subsytem failures. I think if you want to fix unveil you also need to look at namei and understand vnodes. I'm not ready for that. I just have no time to start yet another huge project like this but I know I will be pulled into this. So maybe I'm selfish here but right now I think the pragmatic solution is good enough right now. > > Why can't we use all tools in the box? Why can we only use rwlocks and > > mutexes for our solutions? > > Theo can do what he wants, it's his project. I'm just sharing my > experience of 12 years working on parallelizing OpenBSD and what I > believe is better for the project, its developers and users. > > > > > Since both pledge and unveil are almost never called it is better to > > > > optimise for the readers (which happen on all or many syscalls). > > > > > > Indeed, a read/write lock is fine. > > > > No it is not. rwlocks are not a golden ticket that makes all pain go away. > > They are massive bottle necks even now that the most obvious problem has > > been fixed. > > They are no golden tickets, they are rwlocks. We know how to use them > and clearly document which fields are protected by them. This allows us > to work as a team re-using the knowledge of a previous developer and > push the line of the lock a bit further. They are well integrated in > WITNESS as well as in the %spin analysis tools. We know how to split > contended lock or use alternate locking primitives when necessary. rwlocks are actually evading a lot of the analysis tools and contention is hard to detect. Until recently rwlocks where horribly inefficent. The fact that you can sleep with rwlocks held is nice but also a big problem and the way the lock adds extra sleeping points can introduce subtle bugs for people that are not so well versed as you in their use. Writing thread safe code is hard and it is extra hard if performance matters. -- :wq Claudio

- 2025-02-10 12:02 Claudio Jeker:
  patch: relax ni_pledge panic
- - 2025-02-17 14:40 Martin Pieuchot:
    patch: relax ni_pledge panic
  - - 2025-02-18 13:00 Claudio Jeker:
      patch: relax ni_pledge panic
    - - 2025-02-18 14:16 Mark Kettenis:
        patch: relax ni_pledge panic

2025-02-07 14:46 Theo de Raadt:

patch: relax ni_pledge panic