From: Vitaliy Makkoveev <mvs@openbsd.org>
Subject: Re: Don't take solock() in sosend() and soreceive() paths for unix(4) sockets
To: Alexander Bluhm <alexander.bluhm@gmx.net>
Cc: tech@openbsd.org
Date: Fri, 3 May 2024 03:24:28 +0300

On Fri, May 03, 2024 at 02:10:49AM +0300, Vitaliy Makkoveev wrote:
> On Fri, May 03, 2024 at 12:26:32AM +0200, Alexander Bluhm wrote:
> > On Wed, May 01, 2024 at 01:01:22AM +0300, Vitaliy Makkoveev wrote:
> > > On Tue, Apr 30, 2024 at 11:01:04PM +0300, Vitaliy Makkoveev wrote:
> > > > Use `sb_mtx' mutex(9) and `sb_lock' rwlock(9) to protect `so_snd' and
> > > > `so_rcv' of unix(4) sockets.
> > > > 
> > > > The transmission of unix(4) sockets already half-unlocked because
> > > > connected peer is not locked by solock() during sbappend*() call. Since
> > > > the `so_snd' is protected by `sb_mtx' mutex(9) the re-locking is not
> > > > required in uipc_rcvd() too.
> > > > 
> > > > SB_OWNLOCK became redundant with SB_MTXLOCK, so remove it. SB_MTXLOCK
> > > > was kept because checks against SB_MTXLOCK within sb*() routines look
> > > > more consistent to me.
> > > > 
> > > > Please note, the unlocked peer `so2' of unix(4) can't be disconnected
> > > > while solock() is held on `so'. That's why unlocked sorwakeup() and
> > > > sowwakeup() are fine, corresponding paths will never be followed.
> > > > 
> > > 
> > > Add forgotten fifofs chunks. Against previous, this diff does direct
> > > `so_rcv' dispose and cleanup in sofree(). This sockets is almost dead
> > > and unlinked from everywhere include spliced peer. So concurrent
> > > sotask() thread will just exit. This required to solve inode and so_rcv
> > > locks ordering. Also this removes re-locking from sofree() for all
> > > sockets.
> > 
> > Passed regress with witness.
> > 
> > One question, otherwise OK bluhm@
> > 
> > [skip] 
> > 
> > You change the order of EPIPE and ENOTCONN error checks here.
> > The difference is relevant, EPIPE may signal a SIGPIPE.
> > 
> > When writing into a socket that stops working, the error should be
> > EPIPE.  If there are two processes, A is writing to B, and B
> > disconnects the socket, then A should get a EPIPE.
> > 
> > Do you change the behavior?
> > 
> > In sosend() we have a SS_CANTSENDMORE check with EPIPE error.
> > So maybe the uipc_send() errors are irrelevant.
> > 
> 
> Yes, I change behavior. I had concerns , so I checked FreeBSD version of
> uipc_send() and found that they do "so->so_state & SS_ISCONNECTED" check
> before "so->so_snd.sb_state & SBS_CANTSENDMORE". But sosend_generic()
> has the different order. Since this code exists for a long time, I found
> this order not significant.
> 
> However, our fifofs is the only place where we do direct SS_CANTSENDMORE
> flag modifications. We could keep solock() around them for serialization
> with uipc_send() and keep existing checks order.
> 

Please note, this time we have places where we take `sb_mtx' mutex(9)
simultaneously for both `so_rcv' and `so_snd' and the `so_rcv' should be
locked first. We have two path groups, where `so_snd' and `so_rcv'
belong to different sockets. The one is the splicing sockets and the
second is the unix(4) sockets. There is the different groups without
intersection. Also, for unix(4) sockets case, SOCK_STREAM, SOCK_DGRAM
and SOCK_SEQPACKET are different and can't be connected to each outher.

Therefore, we could use different lock orders. The lock order for
SOCK_DGRAM should be kept because these sockets can be connected to
itself, so the lock order should be the same as in soreserve(). But
this is not true for SOCK_STREAM and SOCK_SEQPACKET. So, in the
uipc_send() `so_snd' could be locked first.

So the hypothetical uipc_send() from the distant future could be as
following without any additional solock() serialization. But since this
requires some tricks with witness(4) and makes locking much more harder
to understand, lets left it for the distant future.

uipc_send(struct socket *so, ...)
{
        /* skip */

        mtx_enter(so->so_snd.sb_mtx); 
        if (so->so_snd.sb_state & SS_CANTSENDMORE) {
                error = EPIPE;
                goto error;
        }
        if (unp->unp_conn == NULL) {
                error = ENOTCONN;
                goto error;
        }

        so2 = unp->unp_conn->unp_socket;

        mtx_enter(&so2->so_rcv.sb_mtx);
        /* sbappend*(so2, &so2->so_rcv, m) */
        so->so_snd.sb_mbcnt = so2->so_rcv.sb_mbcnt;
        so->so_snd.sb_cc = so2->so_rcv.sb_cc;
        if (so2->so_rcv.sb_cc > 0)
                dowakeup = 1;
        mtx_leave(&so2->so_rcv.sb_mtx);
        mtx_leave(&so->so_snd.sb_mtx);