From: Claudio Jeker Subject: Re: Use WRITE_ONCE() to se set so_error in sosend()/soreceive() To: Alexander Bluhm Cc: Vitaliy Makkoveev , Mark Kettenis , OpenBSD tech Date: Sat, 11 Jan 2025 14:15:09 +0100 On Fri, Jan 10, 2025 at 10:26:39PM +0100, Alexander Bluhm wrote: > On Tue, Jan 07, 2025 at 04:00:31PM +0100, Martin Pieuchot wrote: > > On 07/01/25(Tue) 13:57, Vitaliy Makkoveev wrote: > > > > > > > > > > On 7 Jan 2025, at 13:45, Mark Kettenis wrote: > > > > > > > >> From: Vitaliy Makkoveev > > > >> Date: Mon, 6 Jan 2025 20:07:19 +0300 > > > >> > > > >>> On 6 Jan 2025, at 19:45, Mark Kettenis wrote: > > > >>> > > > >>>> Date: Mon, 6 Jan 2025 18:58:16 +0300 > > > >>>> From: Vitaliy Makkoveev > > > >>>> > > > >>>> This is the lockless access, so do it. > > > >>> > > > >>> Sorry, but once again, I don't think this makes sense. Atomic access > > > >>> doesn't mean blindly replacing assignments with > > > >>> READ_ONCE()/WRITE_ONCE() or atomic_load_int()/atomic_store_int(). The > > > >>> diff below makes me strongly suspect accessing so_error without > > > >>> holding some sort of lock isn't actually safe. At the very minimum > > > >>> you're missing some memory barriers here! > > > >>> > > > >>> > > > >> > > > >> This is not the blindly replacing, we need WRITE_ONCE()/READ_ONCE() > > > >> here to prevent reordering and be sure, so_error value is not cached. > > > > > > > > But using READ_ONCE()/WRITE_ONCE() isn't enough to guarantee ordering > > > > between threads. You need proper memory barriers for that. > > > > > > > > > > I???m not interesting in order between threads here, so_error value > > > could be lost even in the good old kernel locked or uniprocessor > > > times. I only want it to be loaded or writed each do {} while() > > > iteration, that???s the reason of *_ONCE() to exist. > > > > I don't understand why do we need READ_ONCE() to ensure the value is > > loaded for every iteration of the loop. Isn't already the case? If > > not do the other fields `sb_state', `so_state', etc also need it? > > > > > The synchronisation in this place made by dedicated locks. > > > > > > > Now the question is what are you ordering against? That isn't > > > > immediately obvious to me; marking a variable or struct member as > > > > "atomic" in the header file doesn't really help here... > > The locking strategy for so_error is not that clear. > > We do unlocked access here. We want to avoid extra mutex just to > transfer the integer variable so_error to userland. I personally > oppose accessing variables without any protections when used in > more than one thread. READ_ONCE()/WRITE_ONCE() is the bare minumum > to avoid compiler misoptimizations. So the code with the diff is > better than before. The question is, is it enough or does it look > like false safety? > > Code like this looks odd: > if ((error = READ_ONCE(so->so_error))) { > WRITE_ONCE(so->so_error, 0); > I would at least expect an atomic swap operation. Userland threads > are no problem, they are locked, but the softnet thread can modify > so->so_error in between. This was not the case in the old days, > BSD4.4 had splnet() around this code. > > So userland may see only one error, when two softnet threads write > two errors. But this can always happen as softnet threads just > write to the variable. Userland sees at least one error, and that > is important. > > Next question is, do we need memory barriers? The code does several > checks and returns different errors to userland. So in theory the > values stored in so->so_error and so->so_state & SS_ISCONNECTED > could be reordered. The userland could miss one error and do the > SS_ISCONNECTED handling instead. But again, does it matter when > the result of a TCP FIN is processed before an error from an ICMP > packet? Such events are not really synchronized by the stack. > > So I would say the proposed chunk is ok. > > Same arguments for the MSG_PEEK chunk. > > Next question was whether READ_ONCE() is necessary to read so->so_error > in each loop iteration. In my opinion READ_ONCE() is more important > than WRITE_ONCE(). An compiler can optimize away read operations > from loops. In this case sbwait() acts as a compiler barrier. But > I prefer explicit READ_ONCE() in this case. Better than guessing > what the compiler might do. > > OK bluhm@ for the READ_ONCE() chunk. > > I am also in favor for the WRITE_ONCE() chunks, but there are other > opinions. We should find a consensus how to handle such integer > access. I don't understand this discussion. Is so_error a real atomic value or does it depend on other state changes as well? When I look at the code e.g. tcp_drop() sets so_errno but also closes the connection. All of this should happen in an atomic fashion not just assigning errno to so->so_error. Also doing a READ_ONCE() followed by a WRITE_ONCE() is not an atomic operation. So I really think this may lure people into thinking that this is safe while in reality it is not. Skipping proper locking because it may be expensive is a very bad choice. We really should make sure the code is correct even if that requires additional locks (or proper understanding of which locks protect what). > bluhm > > > > >>>> Index: sys/kern/uipc_socket.c > > > >>>> =================================================================== > > > >>>> RCS file: /cvs/src/sys/kern/uipc_socket.c,v > > > >>>> diff -u -p -r1.356 uipc_socket.c > > > >>>> --- sys/kern/uipc_socket.c 4 Jan 2025 15:57:02 -0000 1.356 > > > >>>> +++ sys/kern/uipc_socket.c 6 Jan 2025 15:53:54 -0000 > > > >>>> @@ -640,7 +640,7 @@ restart: > > > >>>> if (so->so_snd.sb_state & SS_CANTSENDMORE) > > > >>>> snderr(EPIPE); > > > >>>> if ((error = READ_ONCE(so->so_error))) { > > > >>>> - so->so_error = 0; > > > >>>> + WRITE_ONCE(so->so_error, 0); > > > >>>> snderr(error); > > > >>>> } > > > >>>> if ((so->so_state & SS_ISCONNECTED) == 0) { > > > >>>> @@ -934,7 +934,7 @@ restart: > > > >>>> goto dontblock; > > > >>>> error = error2; > > > >>>> if ((flags & MSG_PEEK) == 0) > > > >>>> - so->so_error = 0; > > > >>>> + WRITE_ONCE(so->so_error, 0); > > > >>>> goto release; > > > >>>> } > > > >>>> if (so->so_rcv.sb_state & SS_CANTRCVMORE) { > > > >>>> @@ -1204,7 +1204,7 @@ dontblock: > > > >>>> while (flags & MSG_WAITALL && m == NULL && uio->uio_resid > 0 && > > > >>>> !sosendallatonce(so) && !nextrecord) { > > > >>>> if (so->so_rcv.sb_state & SS_CANTRCVMORE || > > > >>>> - so->so_error) > > > >>>> + READ_ONCE(so->so_error)) > > > >>>> break; > > > >>>> SBLASTRECORDCHK(&so->so_rcv, "soreceive sbwait 2"); > > > >>>> SBLASTMBUFCHK(&so->so_rcv, "soreceive sbwait 2"); > > > > > > -- :wq Claudio