Download raw body.
fstat sysctl crashes kernel
On Thu, Aug 08, 2024 at 01:21:54PM +0200, Alexander Bluhm wrote:
> On Thu, Aug 08, 2024 at 02:03:19AM +0300, Vitaliy Makkoveev wrote:
> > > On 8 Aug 2024, at 01:49, Alexander Bluhm <bluhm@openbsd.org> wrote:
> > >
> > > Hi,
> > >
> > > There is this long lasting problem that fstat(1) in a loop during
> > > heavy network load can crash the kernel. Bug is that while traversing
> > > allprocess in sysctl KERN_FILE_BYPID the list may change if fill_file()
> > > is sleeping. Conclusion at previous hackaton was that the single
> > > sysctl is doing too much.
> > >
> > > So I changed fstat to first call kvm_getprocs() getting all pids,
> > > and then kvm_getfiles(KERN_FILE_BYPID) for each pid. This allows
> > > to abort the list traversal after a matching process has been found.
> > > So we don't have to deal with dead next pointer. Of course fstat
> > > gets slower as it does multiple sysctl with linear process search,
> > > but slow is better than crash.
> > >
> > > It is necessary to lock the current process, otherwise fd_getfile(fdp)
> > > may use a freed struct filedesc. I am not happy with using
> > > &pr->ps_lock in exit1() as it adds additional sleeping points. With
> > > the diff below fork+exec, tcpbench, fstat in loops is stable. To
> > > avoid sleeps in allprocess loop I use RW_SLEEPFAIL.
> > >
> > > Note that only KERN_FILE_BYPID with given pid is safe. After we
> > > remove all other sysctl in userland, we can reject them in the
> > > kernel.
> > >
> > > Any idea how we can lock the process in a nicer way?
> > >
> >
> > As I remember, when I tried to introduce allprocess rwlock(9), the
> > only problem was false (?) witness(4) report. I like to follow this
> > way. I don???t mean various allprocess list iterators, as it was exposed
> > this was bad idea.
>
> We tried to fix this for several years. Interlocking of process
> list and single process lock with all the other locks is complicated.
> Discussion in Prague concluded, that too much logic was moved into
> the kernel when reading /dev/kmem was converted to sysctl. My diff
> moves complexity back to userland which is good.
>
> Your approach has to solve two problems, keeping the list consistent
> and locking a single process. Can you show your diff again, as
> consistent view of one process is a common problem. Maybe we can
> combine our ideas.
>
> Independent of that, can we commit the chunk below? When searching
> for a specific process, there is not need to traverse the list to
> the end. Also I check for arg >= 0 as this is cosistent with the
> arg < -1 check before. Then you don't have to think about what is
> special about process 0. There is no functional difference as
> process 0 has PS_SYSTEM set and is skipped anyway.
>
> ok?
>
ok mvs
> bluhm
>
> Index: kern/kern_sysctl.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v
> diff -u -p -r1.434 kern_sysctl.c
> --- kern/kern_sysctl.c 6 Aug 2024 12:36:54 -0000 1.434
> +++ kern/kern_sysctl.c 8 Aug 2024 09:47:45 -0000
> @@ -1679,7 +1679,7 @@ sysctl_file(int *name, u_int namelen, ch
> */
> if (pr->ps_flags & (PS_SYSTEM | PS_EMBRYO | PS_EXITING))
> continue;
> - if (arg > 0 && pr->ps_pid != (pid_t)arg) {
> + if (arg >= 0 && pr->ps_pid != (pid_t)arg) {
> /* not the pid we are looking for */
> continue;
> }
> @@ -1699,6 +1699,9 @@ sysctl_file(int *name, u_int namelen, ch
> FILLIT(fp, fdp, i, NULL, pr);
> FRELE(fp, p);
> }
> + /* pid is unique, stop searching */
> + if (arg >= 0)
> + break;
> }
> if (!matched)
> error = ESRCH;
>
fstat sysctl crashes kernel