Index | Thread | Search

From:
Vitaliy Makkoveev <mvs@openbsd.org>
Subject:
Re: fstat sysctl crashes kernel
To:
Alexander Bluhm <bluhm@openbsd.org>
Cc:
Vitaliy Makkoveev <otto@bsdbox.dev>, tech@openbsd.org
Date:
Thu, 8 Aug 2024 17:15:11 +0300

Download raw body.

Thread
  • Vitaliy Makkoveev:

    fstat sysctl crashes kernel

  • Vitaliy Makkoveev:

    fstat sysctl crashes kernel

  • On Thu, Aug 08, 2024 at 01:21:54PM +0200, Alexander Bluhm wrote:
    > On Thu, Aug 08, 2024 at 02:03:19AM +0300, Vitaliy Makkoveev wrote:
    > > > On 8 Aug 2024, at 01:49, Alexander Bluhm <bluhm@openbsd.org> wrote:
    > > > 
    > > > Hi,
    > > > 
    > > > There is this long lasting problem that fstat(1) in a loop during
    > > > heavy network load can crash the kernel.  Bug is that while traversing
    > > > allprocess in sysctl KERN_FILE_BYPID the list may change if fill_file()
    > > > is sleeping.  Conclusion at previous hackaton was that the single
    > > > sysctl is doing too much.
    > > > 
    > > > So I changed fstat to first call kvm_getprocs() getting all pids,
    > > > and then kvm_getfiles(KERN_FILE_BYPID) for each pid.  This allows
    > > > to abort the list traversal after a matching process has been found.
    > > > So we don't have to deal with dead next pointer.  Of course fstat
    > > > gets slower as it does multiple sysctl with linear process search,
    > > > but slow is better than crash.
    > > > 
    > > > It is necessary to lock the current process, otherwise fd_getfile(fdp)
    > > > may use a freed struct filedesc.  I am not happy with using
    > > > &pr->ps_lock in exit1() as it adds additional sleeping points.  With
    > > > the diff below fork+exec, tcpbench, fstat in loops is stable.  To
    > > > avoid sleeps in allprocess loop I use RW_SLEEPFAIL.
    > > > 
    > > > Note that only KERN_FILE_BYPID with given pid is safe.  After we
    > > > remove all other sysctl in userland, we can reject them in the
    > > > kernel.
    > > > 
    > > > Any idea how we can lock the process in a nicer way?
    > > > 
    > > 
    > > As I remember, when I tried to introduce allprocess rwlock(9), the
    > > only problem was false (?) witness(4) report. I like to follow this
    > > way. I don???t mean various allprocess list iterators, as it was exposed
    > > this was bad idea.
    > 
    > We tried to fix this for several years.  Interlocking of process
    > list and single process lock with all the other locks is complicated.
    > Discussion in Prague concluded, that too much logic was moved into
    > the kernel when reading /dev/kmem was converted to sysctl.  My diff
    > moves complexity back to userland which is good.
    > 
    > Your approach has to solve two problems, keeping the list consistent
    > and locking a single process.  Can you show your diff again, as
    > consistent view of one process is a common problem.  Maybe we can
    > combine our ideas.
    > 
    > Independent of that, can we commit the chunk below?  When searching
    > for a specific process, there is not need to traverse the list to
    > the end.  Also I check for arg >= 0 as this is cosistent with the
    > arg < -1 check before.  Then you don't have to think about what is
    > special about process 0.  There is no functional difference as
    > process 0 has PS_SYSTEM set and is skipped anyway.
    > 
    > ok?
    > 
    
    ok mvs
    
    > bluhm
    > 
    > Index: kern/kern_sysctl.c
    > ===================================================================
    > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v
    > diff -u -p -r1.434 kern_sysctl.c
    > --- kern/kern_sysctl.c	6 Aug 2024 12:36:54 -0000	1.434
    > +++ kern/kern_sysctl.c	8 Aug 2024 09:47:45 -0000
    > @@ -1679,7 +1679,7 @@ sysctl_file(int *name, u_int namelen, ch
    >  			 */
    >  			if (pr->ps_flags & (PS_SYSTEM | PS_EMBRYO | PS_EXITING))
    >  				continue;
    > -			if (arg > 0 && pr->ps_pid != (pid_t)arg) {
    > +			if (arg >= 0 && pr->ps_pid != (pid_t)arg) {
    >  				/* not the pid we are looking for */
    >  				continue;
    >  			}
    > @@ -1699,6 +1699,9 @@ sysctl_file(int *name, u_int namelen, ch
    >  				FILLIT(fp, fdp, i, NULL, pr);
    >  				FRELE(fp, p);
    >  			}
    > +			/* pid is unique, stop searching */
    > +			if (arg >= 0)
    > +				break;
    >  		}
    >  		if (!matched)
    >  			error = ESRCH;
    > 
    
    
    
  • Vitaliy Makkoveev:

    fstat sysctl crashes kernel

  • Vitaliy Makkoveev:

    fstat sysctl crashes kernel