Index | Thread | Search

From:
Mateusz Guzik <mjguzik@gmail.com>
Subject:
Re: fstat sysctl crashes kernel
To:
Vitaliy Makkoveev <mvs@openbsd.org>
Cc:
tech@openbsd.org
Date:
Thu, 8 Aug 2024 14:57:39 +0200

Download raw body.

Thread
  • Mateusz Guzik:

    fstat sysctl crashes kernel

On Thu, Aug 08, 2024 at 11:28:09AM +0000, Vitaliy Makkoveev wrote:
> On Thu, Aug 08, 2024 at 02:03:19AM +0300, Vitaliy Makkoveev wrote:
> > > On 8 Aug 2024, at 01:49, Alexander Bluhm <bluhm@openbsd.org> wrote:
> > As I remember, when I tried to introduce allprocess rwlock(9), the
> > only problem was false (?) witness(4) report. I like to follow this
> > way. I don’t mean various allprocess list iterators, as it was exposed
> > this was bad idea.
> > 
> 
> The minimal diff. Since both fork1() and exit1() are kernel locked, only
> sysctl_file() `allprocess' loops protected with `allprocess_lock'
> rwlock(9).
> 
> Index: sys/kern/kern_sysctl.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
> diff -u -p -r1.435 kern_sysctl.c
> --- sys/kern/kern_sysctl.c	8 Aug 2024 10:25:00 -0000	1.435
> +++ sys/kern/kern_sysctl.c	8 Aug 2024 10:55:09 -0000
> @@ -1675,6 +1675,7 @@ sysctl_file(int *name, u_int namelen, ch
>  			break;
>  		}
>  		matched = 0;
> +		rw_enter_read(&allprocess_lock);
>  		LIST_FOREACH(pr, &allprocess, ps_list) {
>  			/*
>  			 * skip system, exiting, embryonic and undead
> @@ -1703,10 +1704,12 @@ sysctl_file(int *name, u_int namelen, ch
>  				FRELE(fp, p);
>  			}
>  		}
> +		rw_exit_read(&allprocess_lock);
>  		if (!matched)
>  			error = ESRCH;
>  		break;
>  	case KERN_FILE_BYUID:
> +		rw_enter_read(&allprocess_lock);
>  		LIST_FOREACH(pr, &allprocess, ps_list) {
>  			/*
>  			 * skip system, exiting, embryonic and undead
> @@ -1732,6 +1735,7 @@ sysctl_file(int *name, u_int namelen, ch
>  				FRELE(fp, p);
>  			}
>  		}
> +		rw_exit_read(&allprocess_lock);
>  		break;
>  	default:
>  		error = EINVAL;

[sending take #2]

While I agree with the introduction of the lock, the proposed
implementation holds it across copyout.

While technically legal in some cases, it tends to be a bad idea because
such an operation can stall indefinitely, also stalling everyone trying
to grab the lock.

This definitely makes it a no-go for something like the allprocess list
-- suppose someone got stalled like that and now you can't even fork
do diagnose anything because that writelocks it, which has to wait for
that thread.

Anyhow, the stock standard approach boils down to 2 parts:
- iterating the list with a marker
- delaying the process exit or pretending it's no longer there

Conceptually it looks like this:

in exit1:
mtx_enter(&pr->ps_mtx);
atomic_setbits_int(&pr->ps_flags, PS_EXITING);
if (pr->ps_procholders > 0) { /* block waiting for wake up here */ }
....

in the iterator:
mtx_enter(&pr->ps_mtx);
if (pr->ps_flags & PS_EXITING)
        /* unlock and continue the loop */
        ...

pr->ps_procholders++;
mtx_leave(&pr->ps_mtx); /* now any exit will wait for us to finish */

then:
1. insert the dummy process as the one after pr
2. unlock stuff, do the fd work
3. drop procholders + do wakeup if needed
4. continue iteration from the marker->next process

of course there may be more than one marker on the list if there are
others doing the walking. The open-coded LIST_FOREACH will probably want
to get wrapped with a macro intentionally skipping the marker proc.

marker can be indicatd with a dedicated process state, flag or whatever
other signifier one can plop in there.o

For example you could extend p_states to include PRS_MARKER.