Index | Thread | Search

From:
Vitaliy Makkoveev <mvs@openbsd.org>
Subject:
Re: sys/ffs: reclaim vnode before dropping last ref
To:
tech@openbsd.org
Date:
Wed, 24 Jun 2026 14:52:26 +0000

Download raw body.

Thread
On Wed, Jun 24, 2026 at 12:04:32PM +0200, Kirill A. Korinsky wrote:
> On Wed, 24 Jun 2026 11:53:53 +0200,
> Jeremie Courreges-Anglas <jca@wxcvbn.org> wrote:
> > 
> > On Wed, Jun 24, 2026 at 03:35:08AM -0600, Theo de Raadt wrote:
> > > This vrele/vgone error is safe if we want to keep everything biglock.....
> > 
> > For people not in the hackroom, right now a kernel with kirill's diff
> > breaks:
> > 
> > root on sd0a (6cd23ae55f48f660.a) swap on sd0b dump on sd0b
> > uvm_fault(0xffffef843fa328a0, 0x18, 0, 1) -> e
> > kernel: page fault trap, code=0
> > Stopped at      spec_close+0x35:        movl    0x18(%rax),%eax
> >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > *362332  18108      0         0x3          0    0K swapctl
> > spec_close(ffff8000e608aa98) at spec_close+0x35
> > VOP_CLOSE(ffffef83cbf32e60,4,ffffffffffffffff,ffff8000e604d238) at VOP_CLOSE+0x
> > 59
> > vclean(ffffef83cbf32e60,8,ffff8000e604d238) at vclean+0x1c1
> > vgonel(ffffef83cbf32e60,ffff8000e604d238) at vgonel+0x63
> > ffs_vinit(ffff8000002f2800,ffff8000e608ac00) at ffs_vinit+0xc6
> > ffs_vget(ffff8000002f2800,1309a,ffff8000e608acc8) at ffs_vget+0x31f
> > ufs_lookup() at ufs_lookup+0xafd
> > VOP_LOOKUP(ffffef811f0a1510,ffff8000e608b2d0,ffff8000e608b300) at VOP_LOOKUP+0x
> > 3f
> > vfs_lookup(ffff8000e608b2a0) at vfs_lookup+0x39b
> > namei(ffff8000e608b2a0) at namei+0x475
> > sys_swapctl(ffff8000e604d238,ffff8000e608b470,ffff8000e608b3f0) at sys_swapctl+
> > 0x1db
> > syscall(ffff8000e608b470) at syscall+0x5d5
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x75f3ac954920, count: 2
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb{0}> bo re
> > rebooting...
> > 
> 
> Yeah, it has the second part which I'm still thinking in a way that I broke.
> 
> It boots, but...
> 
> So.
> 
> Several checkalias() callers discard the vnode passed to checkalias() after
> moving its filesystem node to the returned alias. In the uncommon block
> device alias case, checkalias() leaves that discarded vnode as a shell: it
> uses spec_vops, but has no specinfo.
> 
> If vgonel() reclaims such a vnode while the caller still holds the last
> reference, vclean(DOCLOSE) treats it as active and dispatches VOP_CLOSE() to
> spec_close(), which expects v_specinfo to be present and dereferences
> v_rdev.
> 
> Such shell vnodes do not represent an opened device instance, so there is
> nothing to close through spec_vops. Skip DOCLOSE for spec_vops vnodes
> without specinfo.
> 

Not my area, but seems like reference counting bug.

> And here the diff:
> 
> Index: sys/kern/vfs_subr.c
> ===================================================================
> RCS file: /home/cvs/src/sys/kern/vfs_subr.c,v
> diff -u -p -r1.334 vfs_subr.c
> --- sys/kern/vfs_subr.c	22 Jun 2026 17:35:50 -0000	1.334
> +++ sys/kern/vfs_subr.c	24 Jun 2026 09:36:03 -0000
> @@ -1168,7 +1168,7 @@ vgonel(struct vnode *vp, struct proc *p)
>  {
>  	struct vnode *vq;
>  	struct vnode *vx;
> -	int s;
> +	int s, flags = DOCLOSE;
>  
>  	KASSERT(vp->v_uvcount == 0);
>  
> @@ -1188,7 +1188,9 @@ vgonel(struct vnode *vp, struct proc *p)
>  	/*
>  	 * Clean out the filesystem specific data.
>  	 */
> -	vclean(vp, DOCLOSE, p);
> +	if (vp->v_op == &spec_vops && vp->v_specinfo == NULL)
> +		flags = 0;
> +	vclean(vp, flags, p);
>  	/*
>  	 * Delete from old mount point vnode list, if on one.
>  	 */
> 
> 
> -- 
> wbr, Kirill
>