From: Mark Kettenis Subject: Re: Use WRITE_ONCE() to se set so_error in sosend()/soreceive() To: Claudio Jeker Cc: bluhm@openbsd.org, mvs@openbsd.org, tech@openbsd.org Date: Sat, 11 Jan 2025 14:19:20 +0100 > Date: Sat, 11 Jan 2025 14:15:09 +0100 > From: Claudio Jeker > > On Fri, Jan 10, 2025 at 10:26:39PM +0100, Alexander Bluhm wrote: > > On Tue, Jan 07, 2025 at 04:00:31PM +0100, Martin Pieuchot wrote: > > > On 07/01/25(Tue) 13:57, Vitaliy Makkoveev wrote: > > > > > > > > > > > > > On 7 Jan 2025, at 13:45, Mark Kettenis wrote: > > > > > > > > > >> From: Vitaliy Makkoveev > > > > >> Date: Mon, 6 Jan 2025 20:07:19 +0300 > > > > >> > > > > >>> On 6 Jan 2025, at 19:45, Mark Kettenis wrote: > > > > >>> > > > > >>>> Date: Mon, 6 Jan 2025 18:58:16 +0300 > > > > >>>> From: Vitaliy Makkoveev > > > > >>>> > > > > >>>> This is the lockless access, so do it. > > > > >>> > > > > >>> Sorry, but once again, I don't think this makes sense. Atomic access > > > > >>> doesn't mean blindly replacing assignments with > > > > >>> READ_ONCE()/WRITE_ONCE() or atomic_load_int()/atomic_store_int(). The > > > > >>> diff below makes me strongly suspect accessing so_error without > > > > >>> holding some sort of lock isn't actually safe. At the very minimum > > > > >>> you're missing some memory barriers here! > > > > >>> > > > > >>> > > > > >> > > > > >> This is not the blindly replacing, we need WRITE_ONCE()/READ_ONCE() > > > > >> here to prevent reordering and be sure, so_error value is not cached. > > > > > > > > > > But using READ_ONCE()/WRITE_ONCE() isn't enough to guarantee ordering > > > > > between threads. You need proper memory barriers for that. > > > > > > > > > > > > > I???m not interesting in order between threads here, so_error value > > > > could be lost even in the good old kernel locked or uniprocessor > > > > times. I only want it to be loaded or writed each do {} while() > > > > iteration, that???s the reason of *_ONCE() to exist. > > > > > > I don't understand why do we need READ_ONCE() to ensure the value is > > > loaded for every iteration of the loop. Isn't already the case? If > > > not do the other fields `sb_state', `so_state', etc also need it? > > > > > > > The synchronisation in this place made by dedicated locks. > > > > > > > > > Now the question is what are you ordering against? That isn't > > > > > immediately obvious to me; marking a variable or struct member as > > > > > "atomic" in the header file doesn't really help here... > > > > The locking strategy for so_error is not that clear. > > > > We do unlocked access here. We want to avoid extra mutex just to > > transfer the integer variable so_error to userland. I personally > > oppose accessing variables without any protections when used in > > more than one thread. READ_ONCE()/WRITE_ONCE() is the bare minumum > > to avoid compiler misoptimizations. So the code with the diff is > > better than before. The question is, is it enough or does it look > > like false safety? > > > > Code like this looks odd: > > if ((error = READ_ONCE(so->so_error))) { > > WRITE_ONCE(so->so_error, 0); > > I would at least expect an atomic swap operation. Userland threads > > are no problem, they are locked, but the softnet thread can modify > > so->so_error in between. This was not the case in the old days, > > BSD4.4 had splnet() around this code. > > > > So userland may see only one error, when two softnet threads write > > two errors. But this can always happen as softnet threads just > > write to the variable. Userland sees at least one error, and that > > is important. > > > > Next question is, do we need memory barriers? The code does several > > checks and returns different errors to userland. So in theory the > > values stored in so->so_error and so->so_state & SS_ISCONNECTED > > could be reordered. The userland could miss one error and do the > > SS_ISCONNECTED handling instead. But again, does it matter when > > the result of a TCP FIN is processed before an error from an ICMP > > packet? Such events are not really synchronized by the stack. > > > > So I would say the proposed chunk is ok. > > > > Same arguments for the MSG_PEEK chunk. > > > > Next question was whether READ_ONCE() is necessary to read so->so_error > > in each loop iteration. In my opinion READ_ONCE() is more important > > than WRITE_ONCE(). An compiler can optimize away read operations > > from loops. In this case sbwait() acts as a compiler barrier. But > > I prefer explicit READ_ONCE() in this case. Better than guessing > > what the compiler might do. > > > > OK bluhm@ for the READ_ONCE() chunk. > > > > I am also in favor for the WRITE_ONCE() chunks, but there are other > > opinions. We should find a consensus how to handle such integer > > access. > > I don't understand this discussion. Is so_error a real atomic value or > does it depend on other state changes as well? > > When I look at the code e.g. tcp_drop() sets so_errno but also closes the > connection. All of this should happen in an atomic fashion not just > assigning errno to so->so_error. > > Also doing a READ_ONCE() followed by a WRITE_ONCE() is not an atomic > operation. So I really think this may lure people into thinking that > this is safe while in reality it is not. > > Skipping proper locking because it may be expensive is a very bad choice. > We really should make sure the code is correct even if that requires > additional locks (or proper understanding of which locks protect what). Thanks Claudio. That is exactly the point I'm trying to make. > > > > >>>> Index: sys/kern/uipc_socket.c > > > > >>>> =================================================================== > > > > >>>> RCS file: /cvs/src/sys/kern/uipc_socket.c,v > > > > >>>> diff -u -p -r1.356 uipc_socket.c > > > > >>>> --- sys/kern/uipc_socket.c 4 Jan 2025 15:57:02 -0000 1.356 > > > > >>>> +++ sys/kern/uipc_socket.c 6 Jan 2025 15:53:54 -0000 > > > > >>>> @@ -640,7 +640,7 @@ restart: > > > > >>>> if (so->so_snd.sb_state & SS_CANTSENDMORE) > > > > >>>> snderr(EPIPE); > > > > >>>> if ((error = READ_ONCE(so->so_error))) { > > > > >>>> - so->so_error = 0; > > > > >>>> + WRITE_ONCE(so->so_error, 0); > > > > >>>> snderr(error); > > > > >>>> } > > > > >>>> if ((so->so_state & SS_ISCONNECTED) == 0) { > > > > >>>> @@ -934,7 +934,7 @@ restart: > > > > >>>> goto dontblock; > > > > >>>> error = error2; > > > > >>>> if ((flags & MSG_PEEK) == 0) > > > > >>>> - so->so_error = 0; > > > > >>>> + WRITE_ONCE(so->so_error, 0); > > > > >>>> goto release; > > > > >>>> } > > > > >>>> if (so->so_rcv.sb_state & SS_CANTRCVMORE) { > > > > >>>> @@ -1204,7 +1204,7 @@ dontblock: > > > > >>>> while (flags & MSG_WAITALL && m == NULL && uio->uio_resid > 0 && > > > > >>>> !sosendallatonce(so) && !nextrecord) { > > > > >>>> if (so->so_rcv.sb_state & SS_CANTRCVMORE || > > > > >>>> - so->so_error) > > > > >>>> + READ_ONCE(so->so_error)) > > > > >>>> break; > > > > >>>> SBLASTRECORDCHK(&so->so_rcv, "soreceive sbwait 2"); > > > > >>>> SBLASTMBUFCHK(&so->so_rcv, "soreceive sbwait 2"); > > > > > > > > > > > -- > :wq Claudio >