Index | Thread | Search

From:
Vitaliy Makkoveev <mvs@openbsd.org>
Subject:
Re: fstat(1) caused a weird hang on 7.7
To:
Alexander Bluhm <bluhm@openbsd.org>
Cc:
"H. Hartzer" <h@hartzer.sh>, tech@openbsd.org
Date:
Thu, 29 May 2025 13:28:27 +0300

Download raw body.

Thread
On Wed, May 28, 2025 at 04:55:35PM +0900, Alexander Bluhm wrote:
> On Tue, May 27, 2025 at 01:59:25PM +0000, H. Hartzer wrote:
> > Please let me know if this isn't the best list for this, and I'll send
> > similiar reports elsewhere in the future.
> 
> bugs@openbsd.org would be slightly better.  Does not matter much,
> the same people read both bugs@ and tech@.
> 
> > I had the most bizarre issue. I ran `fstat | grep 8080` and fstat hang.
> > This caused a bunch of other things to not work, but the system was
> > partly usable.
> 
> This is a deadlock in the sysctl(2) system call that is used by
> fstat(1).  We tried to improve this while the network stack was
> unlocked, maybe we missed something.  At a first glance, I don't
> see relevant fixes that were commited afer 7.7 release.  You might
> consider using openbsd current snapshot to see it the bug is still
> present.
> 
> > What did not:
> > ps
> 
> That is bad, as `ps axl` would show the wait queue where it hangs.
> We need the wait queue to debug it further.
> 
> > I have no idea where to begin on this. I couldn't reproduce it after
> > rebooting. 
> 
> Try to stress test to reproduce.  Run fstat in a loop.  There must
> be some other activity to trigger it.
> 
> As ps axl does not work, we need the kernel debugger output.  For
> that you have to activate ddb.  Write ddb.console=1 into /etc/sysctl.conf
> and reboot.  With serial console break brings you to ddb, glass
> console has the hot key Ctrl-Alt-Esc.  To get out of ddb, type cont
> at ddb> prompt.
> 
> When it happens, drop to ddb.  Then type ps at the ddb> prompt.  On
> serial copy all the output, with monior take a picture.  It scrols
> down page by page, we need all pages.  Send the pictures to this
> list, reducing resolution first.  We want to read the text, but not
> fill our mailbox with your gigapixel camara images.
> 
> Maybe stack traces of the hanging processes would be iteresting,
> but that is too hard to explain in advance.
> 
> As the machine is unusable anyway, type boot reboot at ddb> instead
> of continue.
> 
> > Never had any issues with fstat before.
> 
> It was much more broken before, it could crash the kernel.
> 
> > Maybe I need more open file descriptors?
> 
> No.
> 
> > The last thing I see on fstat from the lists is that it was unlocked:
> > https://marc.info/?l=openbsd-tech&m=173607167913668&w=2
> 
> This is about fstat(2) system call.  Except the name it has nothing
> in common with fstat(1) userland tool.
> 
> bluhm
> 

I suspect it stuck in malloc(M_WAITOK) allocation of `kf'. This sleeping
malloc() happens with `sysctl_lock' held, so the whole sysctl layer
became unusable. Fortunately, we expect context switch in `allprocess'
foreach loops, so we could keep kernel lock only around them and avoid
sleeping memory allocation under `sysctl_lock'.

Actually, we don't need to wire memory in the most sysctl() cases, so we
could avoid `sysctl_lock' and rely on kernel lock only.

Note, we could have memory leak somewhere, so I'm curious the memory
usage stats in this machine.


Index: sys/kern/kern_sysctl.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
diff -u -p -r1.468 kern_sysctl.c
--- sys/kern/kern_sysctl.c	9 May 2025 14:53:22 -0000	1.468
+++ sys/kern/kern_sysctl.c	29 May 2025 10:25:00 -0000
@@ -401,6 +401,8 @@ kern_sysctl_dirs(int top_name, int *name
 	int error;
 
 	switch (top_name) {
+	case KERN_FILE:
+		return (sysctl_file(name, namelen, oldp, oldlenp, p));
 	case KERN_MALLOCSTATS:
 		return (sysctl_malloc(name, namelen, oldp, oldlenp,
 		    newp, newlen, p));
@@ -447,8 +449,6 @@ kern_sysctl_dirs_locked(int top_name, in
 		     newp, newlen, oldp, oldlenp, p));
 	case KERN_PROC_VMMAP:
 		return (sysctl_proc_vmmap(name, namelen, oldp, oldlenp, p));
-	case KERN_FILE:
-		return (sysctl_file(name, namelen, oldp, oldlenp, p));
 #endif
 #if defined(GPROF) || defined(DDBPROF)
 	case KERN_PROF:
@@ -1776,8 +1776,11 @@ do {									\
 					if (af == AF_INET || af == AF_INET6)
 						skip = 1;
 				}
-				if (!skip)
+				if (!skip) {
+					KERNEL_LOCK();
 					FILLIT(fp, NULL, 0, NULL, NULL);
+					KERNEL_UNLOCK();
+				}
 			}
 		}
 		break;
@@ -1788,6 +1791,7 @@ do {									\
 			break;
 		}
 		matched = 0;
+		KERNEL_LOCK();
 		LIST_FOREACH(pr, &allprocess, ps_list) {
 			/*
 			 * skip system, exiting, embryonic and undead
@@ -1825,10 +1829,12 @@ do {									\
 			if (arg >= 0)
 				break;
 		}
+		KERNEL_UNLOCK();
 		if (!matched)
 			error = ESRCH;
 		break;
 	case KERN_FILE_BYUID:
+		KERNEL_LOCK();
 		LIST_FOREACH(pr, &allprocess, ps_list) {
 			/*
 			 * skip system, exiting, embryonic and undead
@@ -1859,6 +1865,7 @@ do {									\
 
 			refcnt_rele_wake(&pr->ps_refcnt);
 		}
+		KERNEL_UNLOCK();
 		break;
 	default:
 		error = EINVAL;