Index | Thread | Search

From:
Kirill A. Korinsky <kirill@korins.ky>
Subject:
Re: sys/nfs: fix TCP socket use after reconnect
To:
Alexander Bluhm <bluhm@openbsd.org>
Cc:
tech@openbsd.org
Date:
Fri, 24 Apr 2026 12:53:19 +0200

Download raw body.

Thread
On Fri, 24 Apr 2026 10:32:10 +0200,
Alexander Bluhm <bluhm@openbsd.org> wrote:
> 
> On Tue, Apr 21, 2026 at 12:55:46PM +0200, Kirill A. Korinsky wrote:
> > tech@,
> > 
> > There is a race in the NFS/TCP client when several requests are active
> > and the connection stalls or times out. One thread enters nfs_receive(),
> > takes the current socket pointer, drops nfs_sndlock(), and waits in
> > soreceive(). Another thread on the same mount may then enter
> > nfs_reconnect(), decide the connection is dead, and close that same
> > socket.
> > 
> > The problem is that the first thread may still use the old socket after
> > reconnect has already replaced it. Under heavy parallel NFS/TCP load,
> > that can wedge reconnect or socket shutdown paths.
> > 
> > Suggested fix is simple: take a socket reference before the receive path
> > uses the socket, and release it only after that path is done.
> > 
> > This is not a strict lock order deadlock. It starts as a socket lifetime
> > bug, then can show up as an apparent deadlock if the wedged path still
> > owns kernel_lock. On a 2 CPU machine that is enough for a full hang: one
> > CPU owns kernel_lock, the other spins waiting for it.
> > 
> > If system has more than 2 CPU, it will survive, but NFS share will be
> > locked, and on reboot system hangs with "sync devices".
> > 
> > I encountered it few times per week when mount NFS share to laptop via
> > wifi with options: soft,intr,tcp,-x=2
> > 
> > I use this diff for more than a week now, and haven't seen NFS issues so
> > far, it actually works quite good. Yes, some times IO returns invalid
> > command and I see in dmesg: short receive ... from nfs server ...;
> > but it is a kind of expected behaviour.
> > 
> > Tests? Feedback? Ok?
> 
> You replace some return with goto errout.  This does some additional
> cleanup.  As error == EWOULDBLOCK the first block is not taken and
> as *mp should be NULL after an error the additional m_freemp()
> should hav no effect.  Also nfs_realign() should have no effect.
> 
> Still it needs additional review for this new error path.  I this
> change intensional?
>

Not really intensional, more like to keep diff smaller and avoid duplicating
sorele with if condition and because that reads as safe to use that way.

-- 
wbr, Kirill