Download raw body.
Unsplice socket before start the teardown
On Thu, Feb 15, 2024 at 09:12:52PM +0300, Vitaliy Makkoveev wrote:
>
>
> > On 15 Feb 2024, at 20:08, Alexander Bluhm <alexander.bluhm@gmx.net> wrote:
> >
> > On Thu, Feb 15, 2024 at 07:29:53PM +0300, Vitaliy Makkoveev wrote:
> >> On Thu, Feb 15, 2024 at 01:20:15PM +0100, Alexander Bluhm wrote:
> >>> On Thu, Feb 15, 2024 at 12:28:28PM +0300, Vitaliy Makkoveev wrote:
> >>>> Use barriers instead of `ssp_idleto' task and timeout(9)
> >>>> re-initialization to wait concurrent somove() and soidle().
> >>>
> >>> Why do you move the unsplice code before the solinger sleep?
> >>>
> >>> Are you sure that you don't change the semantics? Is all data still
> >>> spliced before the connection closes?
> >>>
> >>> bluhm
> >>>
> >>
> >> The socket is inaccessible from userland, but SO_LINGER keeps it alive,
> >> mostly to perform transmission of pending data. The new transmissions
> >> and the reception are impossible. However, this is not true for spliced
> >> sockets because they are still accessible by the spliced peer, but not
> >> forever.
> >>
> >> IIRC, only tcp_ctlinput() will wakeup us, so we leave this loop after
> >> TCP_LINGERTIME or if the remote peer was disconnected.
> >>
> >> So, for the spliced case, you can receive data and awake somove(), but
> >> reach `so_linger' timeout and follow the destruction path. Or you could
> >> pru_send() data to dying `sosp' which already reached `so_linger'
> >> timeout. You don't know is your spliced peer closed, so I see no
> >> difference, will it be unspliced now or after TCP_LINGERTIME seconds.
> >>
> >> Since the pru_detach() is not called, the remote also doesn't know that
> >> socket is closed.
> >>
> >> I wanted to move it from sofree(). Since I see no differences, the idea
> >> to do pru_detach() after disconnect and unsplice looked good.
> >>
> >>
> >> This diff keeps current notation, so the unsplice moved before sofree().
> >
> > As there are a lot of corner cases that I have fixed in the years
> > before, I fear that movin code around may change the semantics. So
> > I like this versiom much more, I have thrown it on regress machine.
> >
> > Before sofree and unsplice was called from in_pcbdetach(). Are you
> > sure that this is no longer necessary? TCP can get a reset and
> > call tcp_close(). It has to unsplice then.
>
> in_pcbdetach() detaches and destroys PCB, but keeps socket alive. It
> sets so_pcb to NULL, but keeps SS_NOFDREF clean, so you return just
> after enter:
>
> sofree(struct socket *so, int keep_lock)
> {
> int persocket = solock_persocket(so);
>
> soassertlocked(so);
>
> if (so->so_pcb || (so->so_state & SS_NOFDREF) == 0) {
> if (!keep_lock)
> sounlock(so);
> return;
> }
>
> Otherwise, this this will be use-after-free issue, because the file
> descriptor is still accessible from userland.
Crash during regress
START sys/kern/sosplice/loop 2024-02-15T20:42:50Z
rm -f a.out [Ee]rrs mklog *.core y.tab.h *.log ktrace.out stamp-*
==== run-chain-ipv4-prototcp-count1 ====
ulimit -n 500; SUDO="" perl /usr/src/regress/sys/kern/sosplice/loop/chain.pl -4 -c 1 -p tcp
==== run-chain-ipv4-prototcp-count2 ====
ulimit -n 500; SUDO="" perl /usr/src/regress/sys/kern/sosplice/loop/chain.pl -4 -c 2 -p tcp
Timeout, server ot1 not responding.
[-- MARK -- Thu Feb 15 21:40:00 2024]
panic: uvm_fault(0xd7b63aa8, 0x0, 0, 1) -> e
Stopped at db_enter+0x4: popl %ebp
TID PID UID PRFLAGS PFLAGS CPU COMMAND
*294092 68423 0 0x18000002 0 3 perl
db_enter() at db_enter+0x4
panic(d0c61049) at panic+0x7a
kpageflttrap(f5f0f66c,0) at kpageflttrap+0x137
trap(f5f0f66c) at trap+0x25f
calltrap() at calltrap+0xc
solock_persocket(d76cb914) at solock_persocket+0xb
sofree(d76cb914,0) at sofree+0x15
soclose(d76cb914,0) at soclose+0x298
soo_close(d7820ee0,d76edd1c) at soo_close+0x1e
fdrop(d7820ee0,d76edd1c) at fdrop+0x6a
closef(d7820ee0,d76edd1c) at closef+0x9d
fdrelease(d76edd1c,6) at fdrelease+0x7a
sys_close(d76edd1c,f5f0f7f0,f5f0f7e8) at sys_close+0x54
syscall(f5f0f830) at syscall+0x55e
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb{3}> show panic
*cpu3: uvm_fault(0xd7b63aa8, 0x0, 0, 1) -> e
ddb{3}> trace
db_enter() at db_enter+0x4
panic(d0c61049) at panic+0x7a
kpageflttrap(f5f0f66c,0) at kpageflttrap+0x137
trap(f5f0f66c) at trap+0x25f
calltrap() at calltrap+0xc
solock_persocket(d76cb914) at solock_persocket+0xb
sofree(d76cb914,0) at sofree+0x15
soclose(d76cb914,0) at soclose+0x298
soo_close(d7820ee0,d76edd1c) at soo_close+0x1e
fdrop(d7820ee0,d76edd1c) at fdrop+0x6a
closef(d7820ee0,d76edd1c) at closef+0x9d
fdrelease(d76edd1c,6) at fdrelease+0x7a
sys_close(d76edd1c,f5f0f7f0,f5f0f7e8) at sys_close+0x54
syscall(f5f0f830) at syscall+0x55e
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel
ddb{3}> show register
ds 0x10
es 0x10
fs 0x20
gs 0
edi 0xd0c61049 acx100_txpower_maxim+0x17d2d
esi 0
ebp 0xf5f0f5e4
ebx 0xf55435d8
edx 0x3fd
ecx 0x2543f619
eax 0x2d
eip 0xd0acde54 db_enter+0x4
cs 0x8
eflags 0x202
esp 0xf5f0f5e4
ss 0x10
db_enter+0x4: popl %ebp
ddb{3}> ps
PID TID PPID UID S FLAGS WAIT COMMAND
*68423 294092 1566 0 7 0x18000002 perl
1566 381024 50079 0 3 0x810008a sigsusp sh
50079 221897 51550 0 3 0x1810008a sigsusp make
51550 141637 42441 0 3 0x810008a sigsusp sh
11165 227136 0 0 3 0x14200 bored sosplice
42441 172021 38678 0 3 0x1810008a sigsusp make
98739 505918 71284 0 3 0x18100082 piperd gzip
71284 40345 38678 0 3 0x8100082 piperd pax
38678 326846 51296 0 3 0x18000082 piperd perl
51296 212340 99795 0 3 0x810008a sigsusp ksh
99795 178440 68741 0 3 0x1800009a kqread sshd
52700 433251 1 0 3 0x18100083 ttyin getty
45855 260933 1 0 3 0x18100083 ttyin getty
39444 214411 1 0 3 0x18100083 ttyin getty
51123 462323 1 0 3 0x18100083 ttyin getty
92546 126208 1 0 3 0x18100083 ttyin getty
10113 456162 1 0 3 0x18100083 ttyin getty
31520 477910 1 0 3 0x18100098 kqread cron
42614 426049 1 99 3 0x19100090 kqread sndiod
80379 237429 1 110 3 0x18100090 kqread sndiod
95484 403418 93616 95 3 0x19100092 kqread smtpd
85783 51147 93616 103 3 0x19100092 kqread smtpd
95595 499204 93616 95 3 0x19100092 kqread smtpd
26227 51221 93616 95 3 0x18100092 kqread smtpd
97457 238234 93616 95 3 0x19100092 kqread smtpd
75053 519968 93616 95 3 0x19100092 kqread smtpd
93616 261812 1 0 3 0x18100080 kqread smtpd
17706 445612 30932 91 3 0x18000092 kqread snmpd_metrics
30932 420221 1 0 3 0x18100080 kqread snmpd
40427 410777 1 91 3 0x19100092 kqread snmpd
68741 355001 1 0 3 0x18000088 kqread sshd
2780 263124 0 0 3 0x14200 acct acct
13887 496736 0 0 3 0x14280 nfsidl nfsio
10297 95168 0 0 3 0x14280 nfsidl nfsio
88712 206326 0 0 3 0x14280 nfsidl nfsio
22183 22208 0 0 3 0x14280 nfsidl nfsio
84524 154609 1 0 3 0x18100080 kqread ntpd
40420 405330 51338 83 3 0x18100092 kqread ntpd
51338 432705 1 83 3 0x19100092 kqread ntpd
6655 208118 28970 74 3 0x19100092 bpf pflogd
28970 27438 1 0 3 0x18000080 netio pflogd
91899 307352 42956 73 3 0x19100090 kqread syslogd
42956 136252 1 0 3 0x18100082 netio syslogd
67690 26924 99660 77 3 0x18100092 kqread dhcpleased
71323 300988 99660 77 3 0x18100092 kqread dhcpleased
99660 442179 1 0 3 0x18000080 kqread dhcpleased
89994 486637 91226 115 3 0x18100092 kqread slaacd
89258 188245 91226 115 3 0x18100092 kqread slaacd
91226 179241 1 0 3 0x18100080 kqread slaacd
80996 220821 0 0 3 0x14200 bored smr
94808 343068 0 0 3 0x14200 pgzero zerothread
10643 490789 0 0 3 0x14200 aiodoned aiodoned
38625 478999 0 0 3 0x14200 syncer update
96730 41784 0 0 3 0x14200 cleaner cleaner
75318 433043 0 0 3 0x14200 reaper reaper
15645 104664 0 0 3 0x14200 pgdaemon pagedaemon
42378 487419 0 0 3 0x14200 bored wsdisplay0
3199 422720 0 0 3 0x14200 usbtsk usbtask
91003 277263 0 0 3 0x14200 usbatsk usbatsk
98115 126124 0 0 3 0x14200 bored sensors
28530 399295 0 0 3 0x40014200 acpi0 acpi0
35845 415026 0 0 7 0x40014200 idle7
60184 31409 0 0 7 0x40014200 idle6
65844 419059 0 0 7 0x40014200 idle5
8954 38915 0 0 7 0x40014200 idle4
74670 191490 0 0 3 0x40014200 idle3
64306 222040 0 0 7 0x40014200 idle2
1038 44274 0 0 7 0x40014200 idle1
56328 94044 0 0 3 0x14200 bored softnet3
24336 335491 0 0 3 0x14200 bored softnet2
34788 411987 0 0 3 0x14200 bored softnet1
29367 155662 0 0 3 0x14200 bored softnet0
37233 511810 0 0 3 0x14200 bored systqmp
76004 97209 0 0 3 0x14200 bored systq
62673 110815 0 0 3 0x14200 tmoslp softclockmp
78786 332800 0 0 3 0x40014200 tmoslp softclock
22207 278488 0 0 7 0x40014200 idle0
41103 513492 0 0 3 0x14200 kmalloc kmthread
1 520550 0 0 3 0x8000082 wait init
0 0 -1 0 3 0x10200 scheduler swapper
ddb{3}> show locks
exclusive rwlock netlock r = 0 (0xd0ec12f8)
ddb{3}> show uvm
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
736543 VM pages: 14118 active, 95046 inactive, 1 wired, 496466 free (62063 zero)
min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
freemin=24551, free-target=32734, inactive-target=0, wired-max=245514
faults=242439947, traps=247588368, intrs=2185812, ctxswitch=68836215 fpuswitch=1881578
softint=7236222, syscalls=563057153, kmapent=84
fault counts:
noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
ok relocks(total)=262758(273410), anget(retries)=97537690(0), amapcopy=47681050
neighbor anon/obj pg=22588133/68129238, gets(lock/unlock)=19423016/274757
cases: anon=90493043, anoncow=7044647, obj=17267032, prcopy=2143985, przero=125011129
daemon and swap counts:
woke=0, revs=0, scans=0, obscans=0, anscans=0
busy=0, freed=0, reactivate=0, deactivate=0
pageouts=0, pending=0, nswget=0
nswapdev=1
swpages=824403, swpginuse=0, swpgonly=0 paging=0
kernel pointers:
objs(kern)=0xd0fd32fc
ddb{3}> mach ddbcpu 0
Stopped at db_enter+0x4: popl %ebp
db_enter() at db_enter+0x4
i386_ipi_handler() at i386_ipi_handler+0x3f
Xipi_untramp() at Xipi_untramp+0xc2
acpicpu_idle() at acpicpu_idle+0xed
cpu_idle_cycle() at cpu_idle_cycle+0xc
ddb{3}> mach ddbcpu 1
Stopped at db_enter+0x4: popl %ebp
db_enter() at db_enter+0x4
i386_ipi_handler() at i386_ipi_handler+0x3f
Xipi_untramp() at Xipi_untramp+0xc2
acpicpu_idle() at acpicpu_idle+0xed
cpu_idle_cycle() at cpu_idle_cycle+0xc
ddb{1}> mach ddbcpu 2
Stopped at db_enter+0x4: popl %ebp
db_enter() at db_enter+0x4
i386_ipi_handler() at i386_ipi_handler+0x3f
Xipi_untramp() at Xipi_untramp+0xc2
acpicpu_idle() at acpicpu_idle+0xed
cpu_idle_cycle() at cpu_idle_cycle+0xc
Unsplice socket before start the teardown