From: Kirill A. Korinsky Subject: sys/nfs: fix TCP socket use after reconnect To: OpenBSD tech Date: Tue, 21 Apr 2026 12:55:46 +0200 tech@, There is a race in the NFS/TCP client when several requests are active and the connection stalls or times out. One thread enters nfs_receive(), takes the current socket pointer, drops nfs_sndlock(), and waits in soreceive(). Another thread on the same mount may then enter nfs_reconnect(), decide the connection is dead, and close that same socket. The problem is that the first thread may still use the old socket after reconnect has already replaced it. Under heavy parallel NFS/TCP load, that can wedge reconnect or socket shutdown paths. Suggested fix is simple: take a socket reference before the receive path uses the socket, and release it only after that path is done. This is not a strict lock order deadlock. It starts as a socket lifetime bug, then can show up as an apparent deadlock if the wedged path still owns kernel_lock. On a 2 CPU machine that is enough for a full hang: one CPU owns kernel_lock, the other spins waiting for it. If system has more than 2 CPU, it will survive, but NFS share will be locked, and on reboot system hangs with "sync devices". I encountered it few times per week when mount NFS share to laptop via wifi with options: soft,intr,tcp,-x=2 I use this diff for more than a week now, and haven't seen NFS issues so far, it actually works quite good. Yes, some times IO returns invalid command and I see in dmesg: short receive ... from nfs server ...; but it is a kind of expected behaviour. Tests? Feedback? Ok? Index: sys/nfs/nfs_socket.c =================================================================== RCS file: /home/cvs/src/sys/nfs/nfs_socket.c,v diff -u -p -r1.156 nfs_socket.c --- sys/nfs/nfs_socket.c 16 Feb 2025 16:05:07 -0000 1.156 +++ sys/nfs/nfs_socket.c 10 Apr 2026 21:13:27 -0000 @@ -525,7 +525,7 @@ nfs_send(struct socket *so, struct mbuf int nfs_receive(struct nfsreq *rep, struct mbuf **aname, struct mbuf **mp) { - struct socket *so; + struct socket *so = NULL; struct uio auio; struct iovec aio; struct mbuf *m; @@ -577,6 +577,11 @@ tryagain: } goto tryagain; } + /* + * Keep the socket alive while using the snapshot taken + * under nfs_sndlock(), even if another thread reconnects. + */ + soref(so); while (rep->r_flags & R_MUSTRESEND) { m = m_copym(rep->r_mreq, 0, M_COPYALL, M_WAIT); nfsstats.rpcretries++; @@ -586,9 +591,13 @@ tryagain: if (error) { if (error == EINTR || error == ERESTART || (error = nfs_reconnect(rep)) != 0) { + sorele(so); + so = NULL; nfs_sndunlock(&rep->r_nmp->nm_flag); return (error); } + sorele(so); + so = NULL; goto tryagain; } } @@ -608,8 +617,10 @@ tryagain: error = soreceive(so, NULL, &auio, NULL, NULL, &rcvflg, 0); if (error == EWOULDBLOCK && rep) { - if (rep->r_flags & R_SOFTTERM) - return (EINTR); + if (rep->r_flags & R_SOFTTERM) { + error = EINTR; + goto errout; + } /* * looks like the server died after it * received the request, make sure @@ -678,8 +689,10 @@ tryagain: &rcvflg, 0); m_freem(control); if (error == EWOULDBLOCK && rep) { - if (rep->r_flags & R_SOFTTERM) - return (EINTR); + if (rep->r_flags & R_SOFTTERM) { + error = EINTR; + goto errout; + } } } while (error == EWOULDBLOCK || (!error && *mp == NULL && control)); @@ -690,6 +703,10 @@ tryagain: len -= auio.uio_resid; } errout: + if (so != NULL) { + sorele(so); + so = NULL; + } if (error && error != EINTR && error != ERESTART) { m_freemp(mp); if (error != EPIPE) -- wbr, Kirill