Download raw body.
tcp(4): use per-sockbuf mutex to protect `so_snd' socket buffer
tcp(4): use per-sockbuf mutex to protect `so_snd' socket buffer
tcp(4): use per-sockbuf mutex to protect `so_snd' socket buffer
> On 24 Dec 2024, at 19:49, Alexander Bluhm <alexander.bluhm@gmx.net> wrote:
>
> On Sat, Dec 21, 2024 at 08:16:05AM +0300, Vitaliy Makkoveev wrote:
>> This diff only unlocks sosend() path, all the rest is still locked
>> exclusively.
>>
>> Even for tcp(4) case, sosend() only checks `so_snd' free space and
>> sleeps if necessary, actual buffer handling happening within exclusively
>> locked PCB layer. In the PCB layer we don't need to protect read-only
>> access to `so_snd' because corresponding read-write access is serialized
>> by socket lock.
>>
>> sosend() needs to be serialized with somove(), so sotask() takes
>> sblock() on `so_snd' of spliced socket. This discards previous
>> protection of tcp(4) spliced sockets where solock() was used to prevent
>> concurrent unsplicing. This scheme was used to avoid sleep in sofree().
>>
>> However, sleep is sofree() is possible. We have two cases:
>>
>> 1. Socket was not yet accept(2)ed. Such sockets can't be accessed from
>> the userland and can't be spliced. sofree() could be called only from
>> PCB layer and don't need to sleep.
>>
>> 2. Socket was accepted. While called form PCB layer, sofree() ignores it
>> because SS_NOFDREF bit is not set. Socket remains spliced, but without
>> PCB. Sockets without PCB can't be accessed from PCB layer, only from
>> userland. So, soclose()/sofree() thread needs to wait concurrent
>> soclose()/sofree() thread for spliced socket as is is already done for
>> udp(4) case. In such case it is safe to release solock() and sleep
>> within sofree().
>>
>> Note, I intentionally left all "if (dosolock)" dances as is.
>
> While running your diff though my performnce tests, I got this panic
> twice. I have never seen it before, so I guess it is related.
>
> panic: softclock: invalid to_clock: 19668320
> Stopped at db_enter+0x14: popq %rbp
> TID PID UID PRFLAGS PFLAGS CPU COMMAND
> 120146 91986 89 0x1100012 0 1 relayd
> 65124 13759 89 0x1100012 0 3 relayd
> 33160 19438 0 0x14000 0x200 2 softnet0
> db_enter() at db_enter+0x14
> panic(ffffffff82883be3) at panic+0xdd
> softclock(0) at softclock+0x1d5
> softintr_dispatch(0) at softintr_dispatch+0xe6
> Xsoftclock() at Xsoftclock+0x27
> acpicpu_idle() at acpicpu_idle+0x2b9
> sched_idle(ffffffff83593ff0) at sched_idle+0x298
> end trace frame: 0x0, count: 8
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports. Insufficient info makes it difficult to find and fix bugs.
>
> ddb{0}> x/s version
> version: OpenBSD 7.6-current (GENERIC.MP) #cvs : D2024.12.23.00.00.00: Tue Dec 24 14:53:39 CET 2024
> root@ot15.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> ddb{0}> show panic
> *cpu0: softclock: invalid to_clock: 19668320
Interesting. Which test triggers this? I suspect sosplice.
tcp(4): use per-sockbuf mutex to protect `so_snd' socket buffer
tcp(4): use per-sockbuf mutex to protect `so_snd' socket buffer
tcp(4): use per-sockbuf mutex to protect `so_snd' socket buffer