From: Martin Pieuchot <mpi@grenadille.net>
Subject: Re: concerning vfs_stall_barrier()
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: claudio@openbsd.org, tech-openbsd <tech@openbsd.org>
Date: Sat, 13 Sep 2025 11:28:01 +0200

On 13/09/25(Sat) 10:44, Mateusz Guzik wrote:
> One of the things which irk me about OpenBSD is vfs_stall_barrier().

I feel the same.

This is a common idiom in OpenBSD: use a big and ugly hack and forget
about it.  This is generally done when we're enable to work together to
implement better solutions.

> The placement in FREF is racy as the consumer can take the trip
> through the lock just before writes are supposed to be frozen,
> effectively defeating the mechanism.
> 
> I presume this got recognized and there was an attempt to remedy the problem in:

I don't believe it was recognized. Well apart from the empirical freezes
or crashes.  This issue has been present since the original commit.  This
has always puzzled me.

> commit fae85b1e6961525c0b9017bb021d98e19c2e4451
> Author: claudio <claudio@openbsd.org>
> Date:   Thu Oct 21 09:59:13 2021 +0000
> 
>     Move vfs_stall_barrier() from the fd layer into vn_lock() and the vfs layer.
>     vfs stalling is used by suspend/resume and by vmt(4) to stall any
>     filesystem operation from altering the state on disk. All these
>     operations will call vn_lock and be stalled. Adjust vfs_stall_barrier()
>     to allow the lock owner to still progress so that suspend can sync
>     the filesystems after stalling vfs operation.
>     OK mpi@
> 
> However, that got reverted in:
> commit 8e49fb4c2a30057c387ac447b5598aa90b4ad4fa
> Author: claudio <claudio@openbsd.org>
> Date:   Mon Oct 25 10:24:54 2021 +0000
> 
>     Revert commitid: ufM9BcSbXqfLpzBH;
>     Move vfs_stall_barrier() from the fd layer into vn_lock() and the vfs layer.
>     In some cases it can result in a deadlock while suspending.
>     Discussed with mpi@ and deraadt@
> 
> I failed to find public discussions on the matter.

There haven't been any.  I wish it would have been different from the
beginning.

> Even so, I can easily imagine how placement in vn_lock results in deadlocks.
> 
> Pick a consumer which locks 2 vnodes (rename as an example) and
> suppose it races against a thread stalling vfs ops:
> 
> CPU0                            CPU1
> lock vnode1
>                                 stall ops
> stalled locking vnode2
>                                 lock vnode1
> 
> Now both threads are waiting on each other.

Indeed.

> The correct (albeit time-consuming) way to approach this is to
> maintain a counter of writes in progress (this includes vnode
> teardown) and only bump it while none of the vnodes are locked yet
> *or* do a "trywrite" and failing to get there, unlock, sleep and
> retry. For a working example I refer you to FreeBSD, also pointing out
> nasty spots which need this kind of treatment.

Sure, could you send a diff for that?  Thanks!