From: Vitaliy Makkoveev Subject: Re: Don't take solock() in sosend() and soreceive() paths for unix(4) sockets To: Alexander Bluhm Cc: tech@openbsd.org Date: Fri, 3 May 2024 03:24:28 +0300 On Fri, May 03, 2024 at 02:10:49AM +0300, Vitaliy Makkoveev wrote: > On Fri, May 03, 2024 at 12:26:32AM +0200, Alexander Bluhm wrote: > > On Wed, May 01, 2024 at 01:01:22AM +0300, Vitaliy Makkoveev wrote: > > > On Tue, Apr 30, 2024 at 11:01:04PM +0300, Vitaliy Makkoveev wrote: > > > > Use `sb_mtx' mutex(9) and `sb_lock' rwlock(9) to protect `so_snd' and > > > > `so_rcv' of unix(4) sockets. > > > > > > > > The transmission of unix(4) sockets already half-unlocked because > > > > connected peer is not locked by solock() during sbappend*() call. Since > > > > the `so_snd' is protected by `sb_mtx' mutex(9) the re-locking is not > > > > required in uipc_rcvd() too. > > > > > > > > SB_OWNLOCK became redundant with SB_MTXLOCK, so remove it. SB_MTXLOCK > > > > was kept because checks against SB_MTXLOCK within sb*() routines look > > > > more consistent to me. > > > > > > > > Please note, the unlocked peer `so2' of unix(4) can't be disconnected > > > > while solock() is held on `so'. That's why unlocked sorwakeup() and > > > > sowwakeup() are fine, corresponding paths will never be followed. > > > > > > > > > > Add forgotten fifofs chunks. Against previous, this diff does direct > > > `so_rcv' dispose and cleanup in sofree(). This sockets is almost dead > > > and unlinked from everywhere include spliced peer. So concurrent > > > sotask() thread will just exit. This required to solve inode and so_rcv > > > locks ordering. Also this removes re-locking from sofree() for all > > > sockets. > > > > Passed regress with witness. > > > > One question, otherwise OK bluhm@ > > > > [skip] > > > > You change the order of EPIPE and ENOTCONN error checks here. > > The difference is relevant, EPIPE may signal a SIGPIPE. > > > > When writing into a socket that stops working, the error should be > > EPIPE. If there are two processes, A is writing to B, and B > > disconnects the socket, then A should get a EPIPE. > > > > Do you change the behavior? > > > > In sosend() we have a SS_CANTSENDMORE check with EPIPE error. > > So maybe the uipc_send() errors are irrelevant. > > > > Yes, I change behavior. I had concerns , so I checked FreeBSD version of > uipc_send() and found that they do "so->so_state & SS_ISCONNECTED" check > before "so->so_snd.sb_state & SBS_CANTSENDMORE". But sosend_generic() > has the different order. Since this code exists for a long time, I found > this order not significant. > > However, our fifofs is the only place where we do direct SS_CANTSENDMORE > flag modifications. We could keep solock() around them for serialization > with uipc_send() and keep existing checks order. > Please note, this time we have places where we take `sb_mtx' mutex(9) simultaneously for both `so_rcv' and `so_snd' and the `so_rcv' should be locked first. We have two path groups, where `so_snd' and `so_rcv' belong to different sockets. The one is the splicing sockets and the second is the unix(4) sockets. There is the different groups without intersection. Also, for unix(4) sockets case, SOCK_STREAM, SOCK_DGRAM and SOCK_SEQPACKET are different and can't be connected to each outher. Therefore, we could use different lock orders. The lock order for SOCK_DGRAM should be kept because these sockets can be connected to itself, so the lock order should be the same as in soreserve(). But this is not true for SOCK_STREAM and SOCK_SEQPACKET. So, in the uipc_send() `so_snd' could be locked first. So the hypothetical uipc_send() from the distant future could be as following without any additional solock() serialization. But since this requires some tricks with witness(4) and makes locking much more harder to understand, lets left it for the distant future. uipc_send(struct socket *so, ...) { /* skip */ mtx_enter(so->so_snd.sb_mtx); if (so->so_snd.sb_state & SS_CANTSENDMORE) { error = EPIPE; goto error; } if (unp->unp_conn == NULL) { error = ENOTCONN; goto error; } so2 = unp->unp_conn->unp_socket; mtx_enter(&so2->so_rcv.sb_mtx); /* sbappend*(so2, &so2->so_rcv, m) */ so->so_snd.sb_mbcnt = so2->so_rcv.sb_mbcnt; so->so_snd.sb_cc = so2->so_rcv.sb_cc; if (so2->so_rcv.sb_cc > 0) dowakeup = 1; mtx_leave(&so2->so_rcv.sb_mtx); mtx_leave(&so->so_snd.sb_mtx);