From: Alexander Bluhm Subject: Re: fstat sysctl crashes kernel To: Vitaliy Makkoveev Cc: tech@openbsd.org Date: Thu, 8 Aug 2024 13:21:54 +0200 On Thu, Aug 08, 2024 at 02:03:19AM +0300, Vitaliy Makkoveev wrote: > > On 8 Aug 2024, at 01:49, Alexander Bluhm wrote: > > > > Hi, > > > > There is this long lasting problem that fstat(1) in a loop during > > heavy network load can crash the kernel. Bug is that while traversing > > allprocess in sysctl KERN_FILE_BYPID the list may change if fill_file() > > is sleeping. Conclusion at previous hackaton was that the single > > sysctl is doing too much. > > > > So I changed fstat to first call kvm_getprocs() getting all pids, > > and then kvm_getfiles(KERN_FILE_BYPID) for each pid. This allows > > to abort the list traversal after a matching process has been found. > > So we don't have to deal with dead next pointer. Of course fstat > > gets slower as it does multiple sysctl with linear process search, > > but slow is better than crash. > > > > It is necessary to lock the current process, otherwise fd_getfile(fdp) > > may use a freed struct filedesc. I am not happy with using > > &pr->ps_lock in exit1() as it adds additional sleeping points. With > > the diff below fork+exec, tcpbench, fstat in loops is stable. To > > avoid sleeps in allprocess loop I use RW_SLEEPFAIL. > > > > Note that only KERN_FILE_BYPID with given pid is safe. After we > > remove all other sysctl in userland, we can reject them in the > > kernel. > > > > Any idea how we can lock the process in a nicer way? > > > > As I remember, when I tried to introduce allprocess rwlock(9), the > only problem was false (?) witness(4) report. I like to follow this > way. I don???t mean various allprocess list iterators, as it was exposed > this was bad idea. We tried to fix this for several years. Interlocking of process list and single process lock with all the other locks is complicated. Discussion in Prague concluded, that too much logic was moved into the kernel when reading /dev/kmem was converted to sysctl. My diff moves complexity back to userland which is good. Your approach has to solve two problems, keeping the list consistent and locking a single process. Can you show your diff again, as consistent view of one process is a common problem. Maybe we can combine our ideas. Independent of that, can we commit the chunk below? When searching for a specific process, there is not need to traverse the list to the end. Also I check for arg >= 0 as this is cosistent with the arg < -1 check before. Then you don't have to think about what is special about process 0. There is no functional difference as process 0 has PS_SYSTEM set and is skipped anyway. ok? bluhm Index: kern/kern_sysctl.c =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v diff -u -p -r1.434 kern_sysctl.c --- kern/kern_sysctl.c 6 Aug 2024 12:36:54 -0000 1.434 +++ kern/kern_sysctl.c 8 Aug 2024 09:47:45 -0000 @@ -1679,7 +1679,7 @@ sysctl_file(int *name, u_int namelen, ch */ if (pr->ps_flags & (PS_SYSTEM | PS_EMBRYO | PS_EXITING)) continue; - if (arg > 0 && pr->ps_pid != (pid_t)arg) { + if (arg >= 0 && pr->ps_pid != (pid_t)arg) { /* not the pid we are looking for */ continue; } @@ -1699,6 +1699,9 @@ sysctl_file(int *name, u_int namelen, ch FILLIT(fp, fdp, i, NULL, pr); FRELE(fp, p); } + /* pid is unique, stop searching */ + if (arg >= 0) + break; } if (!matched) error = ESRCH;