Index | Thread | Search

From:
Alexander Bluhm <bluhm@openbsd.org>
Subject:
Re: Use WRITE_ONCE() to se set so_error in sosend()/soreceive()
To:
Vitaliy Makkoveev <mvs@openbsd.org>, Mark Kettenis <mark.kettenis@xs4all.nl>, OpenBSD tech <tech@openbsd.org>
Date:
Fri, 10 Jan 2025 22:26:39 +0100

Download raw body.

Thread
On Tue, Jan 07, 2025 at 04:00:31PM +0100, Martin Pieuchot wrote:
> On 07/01/25(Tue) 13:57, Vitaliy Makkoveev wrote:
> > 
> > 
> > > On 7 Jan 2025, at 13:45, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
> > > 
> > >> From: Vitaliy Makkoveev <mvs@openbsd.org>
> > >> Date: Mon, 6 Jan 2025 20:07:19 +0300
> > >> 
> > >>> On 6 Jan 2025, at 19:45, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
> > >>> 
> > >>>> Date: Mon, 6 Jan 2025 18:58:16 +0300
> > >>>> From: Vitaliy Makkoveev <mvs@openbsd.org>
> > >>>> 
> > >>>> This is the lockless access, so do it.
> > >>> 
> > >>> Sorry, but once again, I don't think this makes sense.  Atomic access
> > >>> doesn't mean blindly replacing assignments with
> > >>> READ_ONCE()/WRITE_ONCE() or atomic_load_int()/atomic_store_int().  The
> > >>> diff below makes me strongly suspect accessing so_error without
> > >>> holding some sort of lock isn't actually safe.  At the very minimum
> > >>> you're missing some memory barriers here!
> > >>> 
> > >>> 
> > >> 
> > >> This is not the blindly replacing, we need WRITE_ONCE()/READ_ONCE()
> > >> here to prevent reordering and be sure, so_error value is not cached.
> > > 
> > > But using READ_ONCE()/WRITE_ONCE() isn't enough to guarantee ordering
> > > between threads.  You need proper memory barriers for that.
> > > 
> > 
> > I???m not interesting in order between threads here, so_error value
> > could be lost even in the good old kernel locked or uniprocessor
> > times. I only want it to be loaded or writed each do {} while()
> > iteration, that???s the reason of *_ONCE() to exist.
> 
> I don't understand why do we need READ_ONCE() to ensure the value is
> loaded for every iteration of the loop.  Isn't already the case?  If
> not do the other fields `sb_state', `so_state', etc also need it?
> 
> > The synchronisation in this place made by dedicated locks.
> > 
> > > Now the question is what are you ordering against?  That isn't
> > > immediately obvious to me; marking a variable or struct member as
> > > "atomic" in the header file doesn't really help here...

The locking strategy for so_error is not that clear.

We do unlocked access here.  We want to avoid extra mutex just to
transfer the integer variable so_error to userland.  I personally
oppose accessing variables without any protections when used in
more than one thread.  READ_ONCE()/WRITE_ONCE() is the bare minumum
to avoid compiler misoptimizations.  So the code with the diff is
better than before.  The question is, is it enough or does it look
like false safety?

Code like this looks odd:
		if ((error = READ_ONCE(so->so_error))) {
			WRITE_ONCE(so->so_error, 0);
I would at least expect an atomic swap operation.  Userland threads
are no problem, they are locked, but the softnet thread can modify
so->so_error in between.  This was not the case in the old days,
BSD4.4 had splnet() around this code.

So userland may see only one error, when two softnet threads write
two errors.  But this can always happen as softnet threads just
write to the variable.  Userland sees at least one error, and that
is important.

Next question is, do we need memory barriers?  The code does several
checks and returns different errors to userland.  So in theory the
values stored in so->so_error and so->so_state & SS_ISCONNECTED
could be reordered.  The userland could miss one error and do the
SS_ISCONNECTED handling instead.  But again, does it matter when
the result of a TCP FIN is processed before an error from an ICMP
packet?  Such events are not really synchronized by the stack.

So I would say the proposed chunk is ok.

Same arguments for the MSG_PEEK chunk.

Next question was whether READ_ONCE() is necessary to read so->so_error
in each loop iteration.  In my opinion READ_ONCE() is more important
than WRITE_ONCE().  An compiler can optimize away read operations
from loops.  In this case sbwait() acts as a compiler barrier.  But
I prefer explicit READ_ONCE() in this case.  Better than guessing
what the compiler might do.

OK bluhm@ for the READ_ONCE() chunk.

I am also in favor for the WRITE_ONCE() chunks, but there are other
opinions.  We should find a consensus how to handle such integer
access.

bluhm

> > >>>> Index: sys/kern/uipc_socket.c
> > >>>> ===================================================================
> > >>>> RCS file: /cvs/src/sys/kern/uipc_socket.c,v
> > >>>> diff -u -p -r1.356 uipc_socket.c
> > >>>> --- sys/kern/uipc_socket.c	4 Jan 2025 15:57:02 -0000	1.356
> > >>>> +++ sys/kern/uipc_socket.c	6 Jan 2025 15:53:54 -0000
> > >>>> @@ -640,7 +640,7 @@ restart:
> > >>>> 		if (so->so_snd.sb_state & SS_CANTSENDMORE)
> > >>>> 			snderr(EPIPE);
> > >>>> 		if ((error = READ_ONCE(so->so_error))) {
> > >>>> -			so->so_error = 0;
> > >>>> +			WRITE_ONCE(so->so_error, 0);
> > >>>> 			snderr(error);
> > >>>> 		}
> > >>>> 		if ((so->so_state & SS_ISCONNECTED) == 0) {
> > >>>> @@ -934,7 +934,7 @@ restart:
> > >>>> 				goto dontblock;
> > >>>> 			error = error2;
> > >>>> 			if ((flags & MSG_PEEK) == 0)
> > >>>> -				so->so_error = 0;
> > >>>> +				WRITE_ONCE(so->so_error, 0);
> > >>>> 			goto release;
> > >>>> 		}
> > >>>> 		if (so->so_rcv.sb_state & SS_CANTRCVMORE) {
> > >>>> @@ -1204,7 +1204,7 @@ dontblock:
> > >>>> 		while (flags & MSG_WAITALL && m == NULL && uio->uio_resid > 0 &&
> > >>>> 		    !sosendallatonce(so) && !nextrecord) {
> > >>>> 			if (so->so_rcv.sb_state & SS_CANTRCVMORE ||
> > >>>> -			    so->so_error)
> > >>>> +			    READ_ONCE(so->so_error))
> > >>>> 				break;
> > >>>> 			SBLASTRECORDCHK(&so->so_rcv, "soreceive sbwait 2");
> > >>>> 			SBLASTMBUFCHK(&so->so_rcv, "soreceive sbwait 2");
> > 
>