Index | Thread | Search

From:
Alexander Bluhm <bluhm@openbsd.org>
Subject:
Re: fstat(1) caused a weird hang on 7.7
To:
"H. Hartzer" <h@hartzer.sh>
Cc:
tech@openbsd.org
Date:
Wed, 28 May 2025 16:55:35 +0900

Download raw body.

Thread
On Tue, May 27, 2025 at 01:59:25PM +0000, H. Hartzer wrote:
> Please let me know if this isn't the best list for this, and I'll send
> similiar reports elsewhere in the future.

bugs@openbsd.org would be slightly better.  Does not matter much,
the same people read both bugs@ and tech@.

> I had the most bizarre issue. I ran `fstat | grep 8080` and fstat hang.
> This caused a bunch of other things to not work, but the system was
> partly usable.

This is a deadlock in the sysctl(2) system call that is used by
fstat(1).  We tried to improve this while the network stack was
unlocked, maybe we missed something.  At a first glance, I don't
see relevant fixes that were commited afer 7.7 release.  You might
consider using openbsd current snapshot to see it the bug is still
present.

> What did not:
> ps

That is bad, as `ps axl` would show the wait queue where it hangs.
We need the wait queue to debug it further.

> I have no idea where to begin on this. I couldn't reproduce it after
> rebooting. 

Try to stress test to reproduce.  Run fstat in a loop.  There must
be some other activity to trigger it.

As ps axl does not work, we need the kernel debugger output.  For
that you have to activate ddb.  Write ddb.console=1 into /etc/sysctl.conf
and reboot.  With serial console break brings you to ddb, glass
console has the hot key Ctrl-Alt-Esc.  To get out of ddb, type cont
at ddb> prompt.

When it happens, drop to ddb.  Then type ps at the ddb> prompt.  On
serial copy all the output, with monior take a picture.  It scrols
down page by page, we need all pages.  Send the pictures to this
list, reducing resolution first.  We want to read the text, but not
fill our mailbox with your gigapixel camara images.

Maybe stack traces of the hanging processes would be iteresting,
but that is too hard to explain in advance.

As the machine is unusable anyway, type boot reboot at ddb> instead
of continue.

> Never had any issues with fstat before.

It was much more broken before, it could crash the kernel.

> Maybe I need more open file descriptors?

No.

> The last thing I see on fstat from the lists is that it was unlocked:
> https://marc.info/?l=openbsd-tech&m=173607167913668&w=2

This is about fstat(2) system call.  Except the name it has nothing
in common with fstat(1) userland tool.

bluhm