Download raw body.
Use WRITE_ONCE() to se set so_error in sosend()/soreceive()
On Fri, Jan 10, 2025 at 10:26:39PM +0100, Alexander Bluhm wrote:
> On Tue, Jan 07, 2025 at 04:00:31PM +0100, Martin Pieuchot wrote:
> > On 07/01/25(Tue) 13:57, Vitaliy Makkoveev wrote:
> > >
> > >
> > > > On 7 Jan 2025, at 13:45, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
> > > >
> > > >> From: Vitaliy Makkoveev <mvs@openbsd.org>
> > > >> Date: Mon, 6 Jan 2025 20:07:19 +0300
> > > >>
> > > >>> On 6 Jan 2025, at 19:45, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
> > > >>>
> > > >>>> Date: Mon, 6 Jan 2025 18:58:16 +0300
> > > >>>> From: Vitaliy Makkoveev <mvs@openbsd.org>
> > > >>>>
> > > >>>> This is the lockless access, so do it.
> > > >>>
> > > >>> Sorry, but once again, I don't think this makes sense. Atomic access
> > > >>> doesn't mean blindly replacing assignments with
> > > >>> READ_ONCE()/WRITE_ONCE() or atomic_load_int()/atomic_store_int(). The
> > > >>> diff below makes me strongly suspect accessing so_error without
> > > >>> holding some sort of lock isn't actually safe. At the very minimum
> > > >>> you're missing some memory barriers here!
> > > >>>
> > > >>>
> > > >>
> > > >> This is not the blindly replacing, we need WRITE_ONCE()/READ_ONCE()
> > > >> here to prevent reordering and be sure, so_error value is not cached.
> > > >
> > > > But using READ_ONCE()/WRITE_ONCE() isn't enough to guarantee ordering
> > > > between threads. You need proper memory barriers for that.
> > > >
> > >
> > > I???m not interesting in order between threads here, so_error value
> > > could be lost even in the good old kernel locked or uniprocessor
> > > times. I only want it to be loaded or writed each do {} while()
> > > iteration, that???s the reason of *_ONCE() to exist.
> >
> > I don't understand why do we need READ_ONCE() to ensure the value is
> > loaded for every iteration of the loop. Isn't already the case? If
> > not do the other fields `sb_state', `so_state', etc also need it?
> >
> > > The synchronisation in this place made by dedicated locks.
> > >
> > > > Now the question is what are you ordering against? That isn't
> > > > immediately obvious to me; marking a variable or struct member as
> > > > "atomic" in the header file doesn't really help here...
>
> The locking strategy for so_error is not that clear.
>
> We do unlocked access here. We want to avoid extra mutex just to
> transfer the integer variable so_error to userland. I personally
> oppose accessing variables without any protections when used in
> more than one thread. READ_ONCE()/WRITE_ONCE() is the bare minumum
> to avoid compiler misoptimizations. So the code with the diff is
> better than before. The question is, is it enough or does it look
> like false safety?
>
> Code like this looks odd:
> if ((error = READ_ONCE(so->so_error))) {
> WRITE_ONCE(so->so_error, 0);
> I would at least expect an atomic swap operation. Userland threads
> are no problem, they are locked, but the softnet thread can modify
> so->so_error in between. This was not the case in the old days,
> BSD4.4 had splnet() around this code.
>
> So userland may see only one error, when two softnet threads write
> two errors. But this can always happen as softnet threads just
> write to the variable. Userland sees at least one error, and that
> is important.
>
> Next question is, do we need memory barriers? The code does several
> checks and returns different errors to userland. So in theory the
> values stored in so->so_error and so->so_state & SS_ISCONNECTED
> could be reordered. The userland could miss one error and do the
> SS_ISCONNECTED handling instead. But again, does it matter when
> the result of a TCP FIN is processed before an error from an ICMP
> packet? Such events are not really synchronized by the stack.
>
> So I would say the proposed chunk is ok.
>
> Same arguments for the MSG_PEEK chunk.
>
> Next question was whether READ_ONCE() is necessary to read so->so_error
> in each loop iteration. In my opinion READ_ONCE() is more important
> than WRITE_ONCE(). An compiler can optimize away read operations
> from loops. In this case sbwait() acts as a compiler barrier. But
> I prefer explicit READ_ONCE() in this case. Better than guessing
> what the compiler might do.
>
> OK bluhm@ for the READ_ONCE() chunk.
>
> I am also in favor for the WRITE_ONCE() chunks, but there are other
> opinions. We should find a consensus how to handle such integer
> access.
I don't understand this discussion. Is so_error a real atomic value or
does it depend on other state changes as well?
When I look at the code e.g. tcp_drop() sets so_errno but also closes the
connection. All of this should happen in an atomic fashion not just
assigning errno to so->so_error.
Also doing a READ_ONCE() followed by a WRITE_ONCE() is not an atomic
operation. So I really think this may lure people into thinking that
this is safe while in reality it is not.
Skipping proper locking because it may be expensive is a very bad choice.
We really should make sure the code is correct even if that requires
additional locks (or proper understanding of which locks protect what).
> bluhm
>
> > > >>>> Index: sys/kern/uipc_socket.c
> > > >>>> ===================================================================
> > > >>>> RCS file: /cvs/src/sys/kern/uipc_socket.c,v
> > > >>>> diff -u -p -r1.356 uipc_socket.c
> > > >>>> --- sys/kern/uipc_socket.c 4 Jan 2025 15:57:02 -0000 1.356
> > > >>>> +++ sys/kern/uipc_socket.c 6 Jan 2025 15:53:54 -0000
> > > >>>> @@ -640,7 +640,7 @@ restart:
> > > >>>> if (so->so_snd.sb_state & SS_CANTSENDMORE)
> > > >>>> snderr(EPIPE);
> > > >>>> if ((error = READ_ONCE(so->so_error))) {
> > > >>>> - so->so_error = 0;
> > > >>>> + WRITE_ONCE(so->so_error, 0);
> > > >>>> snderr(error);
> > > >>>> }
> > > >>>> if ((so->so_state & SS_ISCONNECTED) == 0) {
> > > >>>> @@ -934,7 +934,7 @@ restart:
> > > >>>> goto dontblock;
> > > >>>> error = error2;
> > > >>>> if ((flags & MSG_PEEK) == 0)
> > > >>>> - so->so_error = 0;
> > > >>>> + WRITE_ONCE(so->so_error, 0);
> > > >>>> goto release;
> > > >>>> }
> > > >>>> if (so->so_rcv.sb_state & SS_CANTRCVMORE) {
> > > >>>> @@ -1204,7 +1204,7 @@ dontblock:
> > > >>>> while (flags & MSG_WAITALL && m == NULL && uio->uio_resid > 0 &&
> > > >>>> !sosendallatonce(so) && !nextrecord) {
> > > >>>> if (so->so_rcv.sb_state & SS_CANTRCVMORE ||
> > > >>>> - so->so_error)
> > > >>>> + READ_ONCE(so->so_error))
> > > >>>> break;
> > > >>>> SBLASTRECORDCHK(&so->so_rcv, "soreceive sbwait 2");
> > > >>>> SBLASTMBUFCHK(&so->so_rcv, "soreceive sbwait 2");
> > >
> >
>
--
:wq Claudio
Use WRITE_ONCE() to se set so_error in sosend()/soreceive()