From: Vitaliy Makkoveev Subject: Re: tcp(4): use per-sockbuf mutex to protect `so_snd' socket buffer To: Alexander Bluhm Cc: OpenBSD tech Date: Tue, 24 Dec 2024 19:59:38 +0300 > On 24 Dec 2024, at 19:49, Alexander Bluhm wrote: > > On Sat, Dec 21, 2024 at 08:16:05AM +0300, Vitaliy Makkoveev wrote: >> This diff only unlocks sosend() path, all the rest is still locked >> exclusively. >> >> Even for tcp(4) case, sosend() only checks `so_snd' free space and >> sleeps if necessary, actual buffer handling happening within exclusively >> locked PCB layer. In the PCB layer we don't need to protect read-only >> access to `so_snd' because corresponding read-write access is serialized >> by socket lock. >> >> sosend() needs to be serialized with somove(), so sotask() takes >> sblock() on `so_snd' of spliced socket. This discards previous >> protection of tcp(4) spliced sockets where solock() was used to prevent >> concurrent unsplicing. This scheme was used to avoid sleep in sofree(). >> >> However, sleep is sofree() is possible. We have two cases: >> >> 1. Socket was not yet accept(2)ed. Such sockets can't be accessed from >> the userland and can't be spliced. sofree() could be called only from >> PCB layer and don't need to sleep. >> >> 2. Socket was accepted. While called form PCB layer, sofree() ignores it >> because SS_NOFDREF bit is not set. Socket remains spliced, but without >> PCB. Sockets without PCB can't be accessed from PCB layer, only from >> userland. So, soclose()/sofree() thread needs to wait concurrent >> soclose()/sofree() thread for spliced socket as is is already done for >> udp(4) case. In such case it is safe to release solock() and sleep >> within sofree(). >> >> Note, I intentionally left all "if (dosolock)" dances as is. > > While running your diff though my performnce tests, I got this panic > twice. I have never seen it before, so I guess it is related. > > panic: softclock: invalid to_clock: 19668320 > Stopped at db_enter+0x14: popq %rbp > TID PID UID PRFLAGS PFLAGS CPU COMMAND > 120146 91986 89 0x1100012 0 1 relayd > 65124 13759 89 0x1100012 0 3 relayd > 33160 19438 0 0x14000 0x200 2 softnet0 > db_enter() at db_enter+0x14 > panic(ffffffff82883be3) at panic+0xdd > softclock(0) at softclock+0x1d5 > softintr_dispatch(0) at softintr_dispatch+0xe6 > Xsoftclock() at Xsoftclock+0x27 > acpicpu_idle() at acpicpu_idle+0x2b9 > sched_idle(ffffffff83593ff0) at sched_idle+0x298 > end trace frame: 0x0, count: 8 > https://www.openbsd.org/ddb.html describes the minimum info required in bug > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb{0}> x/s version > version: OpenBSD 7.6-current (GENERIC.MP) #cvs : D2024.12.23.00.00.00: Tue Dec 24 14:53:39 CET 2024 > root@ot15.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > ddb{0}> show panic > *cpu0: softclock: invalid to_clock: 19668320 Interesting. Which test triggers this? I suspect sosplice.