From: Vitaliy Makkoveev Subject: Re: fstat sysctl crashes kernel To: Alexander Bluhm Cc: Vitaliy Makkoveev , tech@openbsd.org Date: Thu, 8 Aug 2024 17:15:11 +0300 On Thu, Aug 08, 2024 at 01:21:54PM +0200, Alexander Bluhm wrote: > On Thu, Aug 08, 2024 at 02:03:19AM +0300, Vitaliy Makkoveev wrote: > > > On 8 Aug 2024, at 01:49, Alexander Bluhm wrote: > > > > > > Hi, > > > > > > There is this long lasting problem that fstat(1) in a loop during > > > heavy network load can crash the kernel. Bug is that while traversing > > > allprocess in sysctl KERN_FILE_BYPID the list may change if fill_file() > > > is sleeping. Conclusion at previous hackaton was that the single > > > sysctl is doing too much. > > > > > > So I changed fstat to first call kvm_getprocs() getting all pids, > > > and then kvm_getfiles(KERN_FILE_BYPID) for each pid. This allows > > > to abort the list traversal after a matching process has been found. > > > So we don't have to deal with dead next pointer. Of course fstat > > > gets slower as it does multiple sysctl with linear process search, > > > but slow is better than crash. > > > > > > It is necessary to lock the current process, otherwise fd_getfile(fdp) > > > may use a freed struct filedesc. I am not happy with using > > > &pr->ps_lock in exit1() as it adds additional sleeping points. With > > > the diff below fork+exec, tcpbench, fstat in loops is stable. To > > > avoid sleeps in allprocess loop I use RW_SLEEPFAIL. > > > > > > Note that only KERN_FILE_BYPID with given pid is safe. After we > > > remove all other sysctl in userland, we can reject them in the > > > kernel. > > > > > > Any idea how we can lock the process in a nicer way? > > > > > > > As I remember, when I tried to introduce allprocess rwlock(9), the > > only problem was false (?) witness(4) report. I like to follow this > > way. I don???t mean various allprocess list iterators, as it was exposed > > this was bad idea. > > We tried to fix this for several years. Interlocking of process > list and single process lock with all the other locks is complicated. > Discussion in Prague concluded, that too much logic was moved into > the kernel when reading /dev/kmem was converted to sysctl. My diff > moves complexity back to userland which is good. > > Your approach has to solve two problems, keeping the list consistent > and locking a single process. Can you show your diff again, as > consistent view of one process is a common problem. Maybe we can > combine our ideas. > > Independent of that, can we commit the chunk below? When searching > for a specific process, there is not need to traverse the list to > the end. Also I check for arg >= 0 as this is cosistent with the > arg < -1 check before. Then you don't have to think about what is > special about process 0. There is no functional difference as > process 0 has PS_SYSTEM set and is skipped anyway. > > ok? > ok mvs > bluhm > > Index: kern/kern_sysctl.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v > diff -u -p -r1.434 kern_sysctl.c > --- kern/kern_sysctl.c 6 Aug 2024 12:36:54 -0000 1.434 > +++ kern/kern_sysctl.c 8 Aug 2024 09:47:45 -0000 > @@ -1679,7 +1679,7 @@ sysctl_file(int *name, u_int namelen, ch > */ > if (pr->ps_flags & (PS_SYSTEM | PS_EMBRYO | PS_EXITING)) > continue; > - if (arg > 0 && pr->ps_pid != (pid_t)arg) { > + if (arg >= 0 && pr->ps_pid != (pid_t)arg) { > /* not the pid we are looking for */ > continue; > } > @@ -1699,6 +1699,9 @@ sysctl_file(int *name, u_int namelen, ch > FILLIT(fp, fdp, i, NULL, pr); > FRELE(fp, p); > } > + /* pid is unique, stop searching */ > + if (arg >= 0) > + break; > } > if (!matched) > error = ESRCH; >