From: Alexander Bluhm Subject: Re: Unsplice socket before start the teardown To: Vitaliy Makkoveev Cc: tech@openbsd.org Date: Fri, 16 Feb 2024 00:30:26 +0100 On Thu, Feb 15, 2024 at 09:12:52PM +0300, Vitaliy Makkoveev wrote: > > > > On 15 Feb 2024, at 20:08, Alexander Bluhm wrote: > > > > On Thu, Feb 15, 2024 at 07:29:53PM +0300, Vitaliy Makkoveev wrote: > >> On Thu, Feb 15, 2024 at 01:20:15PM +0100, Alexander Bluhm wrote: > >>> On Thu, Feb 15, 2024 at 12:28:28PM +0300, Vitaliy Makkoveev wrote: > >>>> Use barriers instead of `ssp_idleto' task and timeout(9) > >>>> re-initialization to wait concurrent somove() and soidle(). > >>> > >>> Why do you move the unsplice code before the solinger sleep? > >>> > >>> Are you sure that you don't change the semantics? Is all data still > >>> spliced before the connection closes? > >>> > >>> bluhm > >>> > >> > >> The socket is inaccessible from userland, but SO_LINGER keeps it alive, > >> mostly to perform transmission of pending data. The new transmissions > >> and the reception are impossible. However, this is not true for spliced > >> sockets because they are still accessible by the spliced peer, but not > >> forever. > >> > >> IIRC, only tcp_ctlinput() will wakeup us, so we leave this loop after > >> TCP_LINGERTIME or if the remote peer was disconnected. > >> > >> So, for the spliced case, you can receive data and awake somove(), but > >> reach `so_linger' timeout and follow the destruction path. Or you could > >> pru_send() data to dying `sosp' which already reached `so_linger' > >> timeout. You don't know is your spliced peer closed, so I see no > >> difference, will it be unspliced now or after TCP_LINGERTIME seconds. > >> > >> Since the pru_detach() is not called, the remote also doesn't know that > >> socket is closed. > >> > >> I wanted to move it from sofree(). Since I see no differences, the idea > >> to do pru_detach() after disconnect and unsplice looked good. > >> > >> > >> This diff keeps current notation, so the unsplice moved before sofree(). > > > > As there are a lot of corner cases that I have fixed in the years > > before, I fear that movin code around may change the semantics. So > > I like this versiom much more, I have thrown it on regress machine. > > > > Before sofree and unsplice was called from in_pcbdetach(). Are you > > sure that this is no longer necessary? TCP can get a reset and > > call tcp_close(). It has to unsplice then. > > in_pcbdetach() detaches and destroys PCB, but keeps socket alive. It > sets so_pcb to NULL, but keeps SS_NOFDREF clean, so you return just > after enter: > > sofree(struct socket *so, int keep_lock) > { > int persocket = solock_persocket(so); > > soassertlocked(so); > > if (so->so_pcb || (so->so_state & SS_NOFDREF) == 0) { > if (!keep_lock) > sounlock(so); > return; > } > > Otherwise, this this will be use-after-free issue, because the file > descriptor is still accessible from userland. Crash during regress START sys/kern/sosplice/loop 2024-02-15T20:42:50Z rm -f a.out [Ee]rrs mklog *.core y.tab.h *.log ktrace.out stamp-* ==== run-chain-ipv4-prototcp-count1 ==== ulimit -n 500; SUDO="" perl /usr/src/regress/sys/kern/sosplice/loop/chain.pl -4 -c 1 -p tcp ==== run-chain-ipv4-prototcp-count2 ==== ulimit -n 500; SUDO="" perl /usr/src/regress/sys/kern/sosplice/loop/chain.pl -4 -c 2 -p tcp Timeout, server ot1 not responding. [-- MARK -- Thu Feb 15 21:40:00 2024] panic: uvm_fault(0xd7b63aa8, 0x0, 0, 1) -> e Stopped at db_enter+0x4: popl %ebp TID PID UID PRFLAGS PFLAGS CPU COMMAND *294092 68423 0 0x18000002 0 3 perl db_enter() at db_enter+0x4 panic(d0c61049) at panic+0x7a kpageflttrap(f5f0f66c,0) at kpageflttrap+0x137 trap(f5f0f66c) at trap+0x25f calltrap() at calltrap+0xc solock_persocket(d76cb914) at solock_persocket+0xb sofree(d76cb914,0) at sofree+0x15 soclose(d76cb914,0) at soclose+0x298 soo_close(d7820ee0,d76edd1c) at soo_close+0x1e fdrop(d7820ee0,d76edd1c) at fdrop+0x6a closef(d7820ee0,d76edd1c) at closef+0x9d fdrelease(d76edd1c,6) at fdrelease+0x7a sys_close(d76edd1c,f5f0f7f0,f5f0f7e8) at sys_close+0x54 syscall(f5f0f830) at syscall+0x55e https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{3}> show panic *cpu3: uvm_fault(0xd7b63aa8, 0x0, 0, 1) -> e ddb{3}> trace db_enter() at db_enter+0x4 panic(d0c61049) at panic+0x7a kpageflttrap(f5f0f66c,0) at kpageflttrap+0x137 trap(f5f0f66c) at trap+0x25f calltrap() at calltrap+0xc solock_persocket(d76cb914) at solock_persocket+0xb sofree(d76cb914,0) at sofree+0x15 soclose(d76cb914,0) at soclose+0x298 soo_close(d7820ee0,d76edd1c) at soo_close+0x1e fdrop(d7820ee0,d76edd1c) at fdrop+0x6a closef(d7820ee0,d76edd1c) at closef+0x9d fdrelease(d76edd1c,6) at fdrelease+0x7a sys_close(d76edd1c,f5f0f7f0,f5f0f7e8) at sys_close+0x54 syscall(f5f0f830) at syscall+0x55e Xsyscall_untramp() at Xsyscall_untramp+0xa9 end of kernel ddb{3}> show register ds 0x10 es 0x10 fs 0x20 gs 0 edi 0xd0c61049 acx100_txpower_maxim+0x17d2d esi 0 ebp 0xf5f0f5e4 ebx 0xf55435d8 edx 0x3fd ecx 0x2543f619 eax 0x2d eip 0xd0acde54 db_enter+0x4 cs 0x8 eflags 0x202 esp 0xf5f0f5e4 ss 0x10 db_enter+0x4: popl %ebp ddb{3}> ps PID TID PPID UID S FLAGS WAIT COMMAND *68423 294092 1566 0 7 0x18000002 perl 1566 381024 50079 0 3 0x810008a sigsusp sh 50079 221897 51550 0 3 0x1810008a sigsusp make 51550 141637 42441 0 3 0x810008a sigsusp sh 11165 227136 0 0 3 0x14200 bored sosplice 42441 172021 38678 0 3 0x1810008a sigsusp make 98739 505918 71284 0 3 0x18100082 piperd gzip 71284 40345 38678 0 3 0x8100082 piperd pax 38678 326846 51296 0 3 0x18000082 piperd perl 51296 212340 99795 0 3 0x810008a sigsusp ksh 99795 178440 68741 0 3 0x1800009a kqread sshd 52700 433251 1 0 3 0x18100083 ttyin getty 45855 260933 1 0 3 0x18100083 ttyin getty 39444 214411 1 0 3 0x18100083 ttyin getty 51123 462323 1 0 3 0x18100083 ttyin getty 92546 126208 1 0 3 0x18100083 ttyin getty 10113 456162 1 0 3 0x18100083 ttyin getty 31520 477910 1 0 3 0x18100098 kqread cron 42614 426049 1 99 3 0x19100090 kqread sndiod 80379 237429 1 110 3 0x18100090 kqread sndiod 95484 403418 93616 95 3 0x19100092 kqread smtpd 85783 51147 93616 103 3 0x19100092 kqread smtpd 95595 499204 93616 95 3 0x19100092 kqread smtpd 26227 51221 93616 95 3 0x18100092 kqread smtpd 97457 238234 93616 95 3 0x19100092 kqread smtpd 75053 519968 93616 95 3 0x19100092 kqread smtpd 93616 261812 1 0 3 0x18100080 kqread smtpd 17706 445612 30932 91 3 0x18000092 kqread snmpd_metrics 30932 420221 1 0 3 0x18100080 kqread snmpd 40427 410777 1 91 3 0x19100092 kqread snmpd 68741 355001 1 0 3 0x18000088 kqread sshd 2780 263124 0 0 3 0x14200 acct acct 13887 496736 0 0 3 0x14280 nfsidl nfsio 10297 95168 0 0 3 0x14280 nfsidl nfsio 88712 206326 0 0 3 0x14280 nfsidl nfsio 22183 22208 0 0 3 0x14280 nfsidl nfsio 84524 154609 1 0 3 0x18100080 kqread ntpd 40420 405330 51338 83 3 0x18100092 kqread ntpd 51338 432705 1 83 3 0x19100092 kqread ntpd 6655 208118 28970 74 3 0x19100092 bpf pflogd 28970 27438 1 0 3 0x18000080 netio pflogd 91899 307352 42956 73 3 0x19100090 kqread syslogd 42956 136252 1 0 3 0x18100082 netio syslogd 67690 26924 99660 77 3 0x18100092 kqread dhcpleased 71323 300988 99660 77 3 0x18100092 kqread dhcpleased 99660 442179 1 0 3 0x18000080 kqread dhcpleased 89994 486637 91226 115 3 0x18100092 kqread slaacd 89258 188245 91226 115 3 0x18100092 kqread slaacd 91226 179241 1 0 3 0x18100080 kqread slaacd 80996 220821 0 0 3 0x14200 bored smr 94808 343068 0 0 3 0x14200 pgzero zerothread 10643 490789 0 0 3 0x14200 aiodoned aiodoned 38625 478999 0 0 3 0x14200 syncer update 96730 41784 0 0 3 0x14200 cleaner cleaner 75318 433043 0 0 3 0x14200 reaper reaper 15645 104664 0 0 3 0x14200 pgdaemon pagedaemon 42378 487419 0 0 3 0x14200 bored wsdisplay0 3199 422720 0 0 3 0x14200 usbtsk usbtask 91003 277263 0 0 3 0x14200 usbatsk usbatsk 98115 126124 0 0 3 0x14200 bored sensors 28530 399295 0 0 3 0x40014200 acpi0 acpi0 35845 415026 0 0 7 0x40014200 idle7 60184 31409 0 0 7 0x40014200 idle6 65844 419059 0 0 7 0x40014200 idle5 8954 38915 0 0 7 0x40014200 idle4 74670 191490 0 0 3 0x40014200 idle3 64306 222040 0 0 7 0x40014200 idle2 1038 44274 0 0 7 0x40014200 idle1 56328 94044 0 0 3 0x14200 bored softnet3 24336 335491 0 0 3 0x14200 bored softnet2 34788 411987 0 0 3 0x14200 bored softnet1 29367 155662 0 0 3 0x14200 bored softnet0 37233 511810 0 0 3 0x14200 bored systqmp 76004 97209 0 0 3 0x14200 bored systq 62673 110815 0 0 3 0x14200 tmoslp softclockmp 78786 332800 0 0 3 0x40014200 tmoslp softclock 22207 278488 0 0 7 0x40014200 idle0 41103 513492 0 0 3 0x14200 kmalloc kmthread 1 520550 0 0 3 0x8000082 wait init 0 0 -1 0 3 0x10200 scheduler swapper ddb{3}> show locks exclusive rwlock netlock r = 0 (0xd0ec12f8) ddb{3}> show uvm Current UVM status: pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12 736543 VM pages: 14118 active, 95046 inactive, 1 wired, 496466 free (62063 zero) min 10% (25) anon, 10% (25) vnode, 5% (12) vtext freemin=24551, free-target=32734, inactive-target=0, wired-max=245514 faults=242439947, traps=247588368, intrs=2185812, ctxswitch=68836215 fpuswitch=1881578 softint=7236222, syscalls=563057153, kmapent=84 fault counts: noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0 ok relocks(total)=262758(273410), anget(retries)=97537690(0), amapcopy=47681050 neighbor anon/obj pg=22588133/68129238, gets(lock/unlock)=19423016/274757 cases: anon=90493043, anoncow=7044647, obj=17267032, prcopy=2143985, przero=125011129 daemon and swap counts: woke=0, revs=0, scans=0, obscans=0, anscans=0 busy=0, freed=0, reactivate=0, deactivate=0 pageouts=0, pending=0, nswget=0 nswapdev=1 swpages=824403, swpginuse=0, swpgonly=0 paging=0 kernel pointers: objs(kern)=0xd0fd32fc ddb{3}> mach ddbcpu 0 Stopped at db_enter+0x4: popl %ebp db_enter() at db_enter+0x4 i386_ipi_handler() at i386_ipi_handler+0x3f Xipi_untramp() at Xipi_untramp+0xc2 acpicpu_idle() at acpicpu_idle+0xed cpu_idle_cycle() at cpu_idle_cycle+0xc ddb{3}> mach ddbcpu 1 Stopped at db_enter+0x4: popl %ebp db_enter() at db_enter+0x4 i386_ipi_handler() at i386_ipi_handler+0x3f Xipi_untramp() at Xipi_untramp+0xc2 acpicpu_idle() at acpicpu_idle+0xed cpu_idle_cycle() at cpu_idle_cycle+0xc ddb{1}> mach ddbcpu 2 Stopped at db_enter+0x4: popl %ebp db_enter() at db_enter+0x4 i386_ipi_handler() at i386_ipi_handler+0x3f Xipi_untramp() at Xipi_untramp+0xc2 acpicpu_idle() at acpicpu_idle+0xed cpu_idle_cycle() at cpu_idle_cycle+0xc