Index | Thread | Search

From:
Claudio Jeker <cjeker@diehard.n-r-g.com>
Subject:
Re: Use WRITE_ONCE() to se set so_error in sosend()/soreceive()
To:
Alexander Bluhm <bluhm@openbsd.org>
Cc:
Vitaliy Makkoveev <mvs@openbsd.org>, Mark Kettenis <mark.kettenis@xs4all.nl>, OpenBSD tech <tech@openbsd.org>
Date:
Sat, 11 Jan 2025 14:15:09 +0100

Download raw body.

Thread
On Fri, Jan 10, 2025 at 10:26:39PM +0100, Alexander Bluhm wrote:
> On Tue, Jan 07, 2025 at 04:00:31PM +0100, Martin Pieuchot wrote:
> > On 07/01/25(Tue) 13:57, Vitaliy Makkoveev wrote:
> > > 
> > > 
> > > > On 7 Jan 2025, at 13:45, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
> > > > 
> > > >> From: Vitaliy Makkoveev <mvs@openbsd.org>
> > > >> Date: Mon, 6 Jan 2025 20:07:19 +0300
> > > >> 
> > > >>> On 6 Jan 2025, at 19:45, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
> > > >>> 
> > > >>>> Date: Mon, 6 Jan 2025 18:58:16 +0300
> > > >>>> From: Vitaliy Makkoveev <mvs@openbsd.org>
> > > >>>> 
> > > >>>> This is the lockless access, so do it.
> > > >>> 
> > > >>> Sorry, but once again, I don't think this makes sense.  Atomic access
> > > >>> doesn't mean blindly replacing assignments with
> > > >>> READ_ONCE()/WRITE_ONCE() or atomic_load_int()/atomic_store_int().  The
> > > >>> diff below makes me strongly suspect accessing so_error without
> > > >>> holding some sort of lock isn't actually safe.  At the very minimum
> > > >>> you're missing some memory barriers here!
> > > >>> 
> > > >>> 
> > > >> 
> > > >> This is not the blindly replacing, we need WRITE_ONCE()/READ_ONCE()
> > > >> here to prevent reordering and be sure, so_error value is not cached.
> > > > 
> > > > But using READ_ONCE()/WRITE_ONCE() isn't enough to guarantee ordering
> > > > between threads.  You need proper memory barriers for that.
> > > > 
> > > 
> > > I???m not interesting in order between threads here, so_error value
> > > could be lost even in the good old kernel locked or uniprocessor
> > > times. I only want it to be loaded or writed each do {} while()
> > > iteration, that???s the reason of *_ONCE() to exist.
> > 
> > I don't understand why do we need READ_ONCE() to ensure the value is
> > loaded for every iteration of the loop.  Isn't already the case?  If
> > not do the other fields `sb_state', `so_state', etc also need it?
> > 
> > > The synchronisation in this place made by dedicated locks.
> > > 
> > > > Now the question is what are you ordering against?  That isn't
> > > > immediately obvious to me; marking a variable or struct member as
> > > > "atomic" in the header file doesn't really help here...
> 
> The locking strategy for so_error is not that clear.
> 
> We do unlocked access here.  We want to avoid extra mutex just to
> transfer the integer variable so_error to userland.  I personally
> oppose accessing variables without any protections when used in
> more than one thread.  READ_ONCE()/WRITE_ONCE() is the bare minumum
> to avoid compiler misoptimizations.  So the code with the diff is
> better than before.  The question is, is it enough or does it look
> like false safety?
> 
> Code like this looks odd:
> 		if ((error = READ_ONCE(so->so_error))) {
> 			WRITE_ONCE(so->so_error, 0);
> I would at least expect an atomic swap operation.  Userland threads
> are no problem, they are locked, but the softnet thread can modify
> so->so_error in between.  This was not the case in the old days,
> BSD4.4 had splnet() around this code.
> 
> So userland may see only one error, when two softnet threads write
> two errors.  But this can always happen as softnet threads just
> write to the variable.  Userland sees at least one error, and that
> is important.
> 
> Next question is, do we need memory barriers?  The code does several
> checks and returns different errors to userland.  So in theory the
> values stored in so->so_error and so->so_state & SS_ISCONNECTED
> could be reordered.  The userland could miss one error and do the
> SS_ISCONNECTED handling instead.  But again, does it matter when
> the result of a TCP FIN is processed before an error from an ICMP
> packet?  Such events are not really synchronized by the stack.
> 
> So I would say the proposed chunk is ok.
> 
> Same arguments for the MSG_PEEK chunk.
> 
> Next question was whether READ_ONCE() is necessary to read so->so_error
> in each loop iteration.  In my opinion READ_ONCE() is more important
> than WRITE_ONCE().  An compiler can optimize away read operations
> from loops.  In this case sbwait() acts as a compiler barrier.  But
> I prefer explicit READ_ONCE() in this case.  Better than guessing
> what the compiler might do.
> 
> OK bluhm@ for the READ_ONCE() chunk.
> 
> I am also in favor for the WRITE_ONCE() chunks, but there are other
> opinions.  We should find a consensus how to handle such integer
> access.

I don't understand this discussion. Is so_error a real atomic value or
does it depend on other state changes as well?

When I look at the code e.g. tcp_drop() sets so_errno but also closes the
connection. All of this should happen in an atomic fashion not just
assigning errno to so->so_error.

Also doing a READ_ONCE() followed by a WRITE_ONCE() is not an atomic
operation. So I really think this may lure people into thinking that
this is safe while in reality it is not.
 
Skipping proper locking because it may be expensive is a very bad choice.
We really should make sure the code is correct even if that requires
additional locks (or proper understanding of which locks protect what).

> bluhm
> 
> > > >>>> Index: sys/kern/uipc_socket.c
> > > >>>> ===================================================================
> > > >>>> RCS file: /cvs/src/sys/kern/uipc_socket.c,v
> > > >>>> diff -u -p -r1.356 uipc_socket.c
> > > >>>> --- sys/kern/uipc_socket.c	4 Jan 2025 15:57:02 -0000	1.356
> > > >>>> +++ sys/kern/uipc_socket.c	6 Jan 2025 15:53:54 -0000
> > > >>>> @@ -640,7 +640,7 @@ restart:
> > > >>>> 		if (so->so_snd.sb_state & SS_CANTSENDMORE)
> > > >>>> 			snderr(EPIPE);
> > > >>>> 		if ((error = READ_ONCE(so->so_error))) {
> > > >>>> -			so->so_error = 0;
> > > >>>> +			WRITE_ONCE(so->so_error, 0);
> > > >>>> 			snderr(error);
> > > >>>> 		}
> > > >>>> 		if ((so->so_state & SS_ISCONNECTED) == 0) {
> > > >>>> @@ -934,7 +934,7 @@ restart:
> > > >>>> 				goto dontblock;
> > > >>>> 			error = error2;
> > > >>>> 			if ((flags & MSG_PEEK) == 0)
> > > >>>> -				so->so_error = 0;
> > > >>>> +				WRITE_ONCE(so->so_error, 0);
> > > >>>> 			goto release;
> > > >>>> 		}
> > > >>>> 		if (so->so_rcv.sb_state & SS_CANTRCVMORE) {
> > > >>>> @@ -1204,7 +1204,7 @@ dontblock:
> > > >>>> 		while (flags & MSG_WAITALL && m == NULL && uio->uio_resid > 0 &&
> > > >>>> 		    !sosendallatonce(so) && !nextrecord) {
> > > >>>> 			if (so->so_rcv.sb_state & SS_CANTRCVMORE ||
> > > >>>> -			    so->so_error)
> > > >>>> +			    READ_ONCE(so->so_error))
> > > >>>> 				break;
> > > >>>> 			SBLASTRECORDCHK(&so->so_rcv, "soreceive sbwait 2");
> > > >>>> 			SBLASTMBUFCHK(&so->so_rcv, "soreceive sbwait 2");
> > > 
> > 
> 

-- 
:wq Claudio