From: Kirill A. Korinsky Subject: Re: sys/nfs: fix TCP socket use after reconnect To: Alexander Bluhm Cc: tech@openbsd.org Date: Fri, 24 Apr 2026 12:53:19 +0200 On Fri, 24 Apr 2026 10:32:10 +0200, Alexander Bluhm wrote: > > On Tue, Apr 21, 2026 at 12:55:46PM +0200, Kirill A. Korinsky wrote: > > tech@, > > > > There is a race in the NFS/TCP client when several requests are active > > and the connection stalls or times out. One thread enters nfs_receive(), > > takes the current socket pointer, drops nfs_sndlock(), and waits in > > soreceive(). Another thread on the same mount may then enter > > nfs_reconnect(), decide the connection is dead, and close that same > > socket. > > > > The problem is that the first thread may still use the old socket after > > reconnect has already replaced it. Under heavy parallel NFS/TCP load, > > that can wedge reconnect or socket shutdown paths. > > > > Suggested fix is simple: take a socket reference before the receive path > > uses the socket, and release it only after that path is done. > > > > This is not a strict lock order deadlock. It starts as a socket lifetime > > bug, then can show up as an apparent deadlock if the wedged path still > > owns kernel_lock. On a 2 CPU machine that is enough for a full hang: one > > CPU owns kernel_lock, the other spins waiting for it. > > > > If system has more than 2 CPU, it will survive, but NFS share will be > > locked, and on reboot system hangs with "sync devices". > > > > I encountered it few times per week when mount NFS share to laptop via > > wifi with options: soft,intr,tcp,-x=2 > > > > I use this diff for more than a week now, and haven't seen NFS issues so > > far, it actually works quite good. Yes, some times IO returns invalid > > command and I see in dmesg: short receive ... from nfs server ...; > > but it is a kind of expected behaviour. > > > > Tests? Feedback? Ok? > > You replace some return with goto errout. This does some additional > cleanup. As error == EWOULDBLOCK the first block is not taken and > as *mp should be NULL after an error the additional m_freemp() > should hav no effect. Also nfs_realign() should have no effect. > > Still it needs additional review for this new error path. I this > change intensional? > Not really intensional, more like to keep diff smaller and avoid duplicating sorele with if condition and because that reads as safe to use that way. -- wbr, Kirill