From: Vitaliy Makkoveev <mvs@openbsd.org>
Subject: Re: Use WRITE_ONCE() to se set so_error in sosend()/soreceive()
To: Mark Kettenis <mark.kettenis@xs4all.nl>
Cc: Claudio Jeker <cjeker@diehard.n-r-g.com>, bluhm@openbsd.org, tech@openbsd.org
Date: Sat, 11 Jan 2025 18:25:11 +0300

> On 11 Jan 2025, at 16:19, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
> 
>> Date: Sat, 11 Jan 2025 14:15:09 +0100
>> From: Claudio Jeker <cjeker@diehard.n-r-g.com>
>> 
>> On Fri, Jan 10, 2025 at 10:26:39PM +0100, Alexander Bluhm wrote:
>>> On Tue, Jan 07, 2025 at 04:00:31PM +0100, Martin Pieuchot wrote:
>>>> On 07/01/25(Tue) 13:57, Vitaliy Makkoveev wrote:
>>>>> 
>>>>> 
>>>>>> On 7 Jan 2025, at 13:45, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
>>>>>> 
>>>>>>> From: Vitaliy Makkoveev <mvs@openbsd.org>
>>>>>>> Date: Mon, 6 Jan 2025 20:07:19 +0300
>>>>>>> 
>>>>>>>> On 6 Jan 2025, at 19:45, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
>>>>>>>> 
>>>>>>>>> Date: Mon, 6 Jan 2025 18:58:16 +0300
>>>>>>>>> From: Vitaliy Makkoveev <mvs@openbsd.org>
>>>>>>>>> 
>>>>>>>>> This is the lockless access, so do it.
>>>>>>>> 
>>>>>>>> Sorry, but once again, I don't think this makes sense.  Atomic access
>>>>>>>> doesn't mean blindly replacing assignments with
>>>>>>>> READ_ONCE()/WRITE_ONCE() or atomic_load_int()/atomic_store_int().  The
>>>>>>>> diff below makes me strongly suspect accessing so_error without
>>>>>>>> holding some sort of lock isn't actually safe.  At the very minimum
>>>>>>>> you're missing some memory barriers here!
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> This is not the blindly replacing, we need WRITE_ONCE()/READ_ONCE()
>>>>>>> here to prevent reordering and be sure, so_error value is not cached.
>>>>>> 
>>>>>> But using READ_ONCE()/WRITE_ONCE() isn't enough to guarantee ordering
>>>>>> between threads.  You need proper memory barriers for that.
>>>>>> 
>>>>> 
>>>>> I???m not interesting in order between threads here, so_error value
>>>>> could be lost even in the good old kernel locked or uniprocessor
>>>>> times. I only want it to be loaded or writed each do {} while()
>>>>> iteration, that???s the reason of *_ONCE() to exist.
>>>> 
>>>> I don't understand why do we need READ_ONCE() to ensure the value is
>>>> loaded for every iteration of the loop.  Isn't already the case?  If
>>>> not do the other fields `sb_state', `so_state', etc also need it?
>>>> 
>>>>> The synchronisation in this place made by dedicated locks.
>>>>> 
>>>>>> Now the question is what are you ordering against?  That isn't
>>>>>> immediately obvious to me; marking a variable or struct member as
>>>>>> "atomic" in the header file doesn't really help here...
>>> 
>>> The locking strategy for so_error is not that clear.
>>> 
>>> We do unlocked access here.  We want to avoid extra mutex just to
>>> transfer the integer variable so_error to userland.  I personally
>>> oppose accessing variables without any protections when used in
>>> more than one thread.  READ_ONCE()/WRITE_ONCE() is the bare minumum
>>> to avoid compiler misoptimizations.  So the code with the diff is
>>> better than before.  The question is, is it enough or does it look
>>> like false safety?
>>> 
>>> Code like this looks odd:
>>> 		if ((error = READ_ONCE(so->so_error))) {
>>> 			WRITE_ONCE(so->so_error, 0);
>>> I would at least expect an atomic swap operation.  Userland threads
>>> are no problem, they are locked, but the softnet thread can modify
>>> so->so_error in between.  This was not the case in the old days,
>>> BSD4.4 had splnet() around this code.
>>> 
>>> So userland may see only one error, when two softnet threads write
>>> two errors.  But this can always happen as softnet threads just
>>> write to the variable.  Userland sees at least one error, and that
>>> is important.
>>> 
>>> Next question is, do we need memory barriers?  The code does several
>>> checks and returns different errors to userland.  So in theory the
>>> values stored in so->so_error and so->so_state & SS_ISCONNECTED
>>> could be reordered.  The userland could miss one error and do the
>>> SS_ISCONNECTED handling instead.  But again, does it matter when
>>> the result of a TCP FIN is processed before an error from an ICMP
>>> packet?  Such events are not really synchronized by the stack.
>>> 
>>> So I would say the proposed chunk is ok.
>>> 
>>> Same arguments for the MSG_PEEK chunk.
>>> 
>>> Next question was whether READ_ONCE() is necessary to read so->so_error
>>> in each loop iteration.  In my opinion READ_ONCE() is more important
>>> than WRITE_ONCE().  An compiler can optimize away read operations
>>> from loops.  In this case sbwait() acts as a compiler barrier.  But
>>> I prefer explicit READ_ONCE() in this case.  Better than guessing
>>> what the compiler might do.
>>> 
>>> OK bluhm@ for the READ_ONCE() chunk.
>>> 
>>> I am also in favor for the WRITE_ONCE() chunks, but there are other
>>> opinions.  We should find a consensus how to handle such integer
>>> access.
>> 
>> I don't understand this discussion. Is so_error a real atomic value or
>> does it depend on other state changes as well?
>> 
>> When I look at the code e.g. tcp_drop() sets so_errno but also closes the
>> connection. All of this should happen in an atomic fashion not just
>> assigning errno to so->so_error.
>> 
>> Also doing a READ_ONCE() followed by a WRITE_ONCE() is not an atomic
>> operation. So I really think this may lure people into thinking that
>> this is safe while in reality it is not.
>> 
>> Skipping proper locking because it may be expensive is a very bad choice.
>> We really should make sure the code is correct even if that requires
>> additional locks (or proper understanding of which locks protect what).
> 
> Thanks Claudio.  That is exactly the point I'm trying to make.

What about to understand how locking works in this area before
make any conclusions?

Don’t worry, I don't want to pushing this diff anymore.