Index | Thread | Search

From:
Mateusz Guzik <mjguzik@gmail.com>
Subject:
concerning vfs_stall_barrier()
To:
claudio@openbsd.org
Cc:
tech-openbsd <tech@openbsd.org>
Date:
Sat, 13 Sep 2025 10:44:25 +0200

Download raw body.

Thread
One of the things which irk me about OpenBSD is vfs_stall_barrier().

The placement in FREF is racy as the consumer can take the trip
through the lock just before writes are supposed to be frozen,
effectively defeating the mechanism.

I presume this got recognized and there was an attempt to remedy the problem in:
commit fae85b1e6961525c0b9017bb021d98e19c2e4451
Author: claudio <claudio@openbsd.org>
Date:   Thu Oct 21 09:59:13 2021 +0000

    Move vfs_stall_barrier() from the fd layer into vn_lock() and the vfs layer.
    vfs stalling is used by suspend/resume and by vmt(4) to stall any
    filesystem operation from altering the state on disk. All these
    operations will call vn_lock and be stalled. Adjust vfs_stall_barrier()
    to allow the lock owner to still progress so that suspend can sync
    the filesystems after stalling vfs operation.
    OK mpi@

However, that got reverted in:
commit 8e49fb4c2a30057c387ac447b5598aa90b4ad4fa
Author: claudio <claudio@openbsd.org>
Date:   Mon Oct 25 10:24:54 2021 +0000

    Revert commitid: ufM9BcSbXqfLpzBH;
    Move vfs_stall_barrier() from the fd layer into vn_lock() and the vfs layer.
    In some cases it can result in a deadlock while suspending.
    Discussed with mpi@ and deraadt@

I failed to find public discussions on the matter.

Even so, I can easily imagine how placement in vn_lock results in deadlocks.

Pick a consumer which locks 2 vnodes (rename as an example) and
suppose it races against a thread stalling vfs ops:

CPU0                            CPU1
lock vnode1
                                stall ops
stalled locking vnode2
                                lock vnode1

Now both threads are waiting on each other.

The correct (albeit time-consuming) way to approach this is to
maintain a counter of writes in progress (this includes vnode
teardown) and only bump it while none of the vnodes are locked yet
*or* do a "trywrite" and failing to get there, unlock, sleep and
retry. For a working example I refer you to FreeBSD, also pointing out
nasty spots which need this kind of treatment.