Download raw body.
Don't take solock() in soreceive() for tcp(4) sockets
On Tue, Apr 16, 2024 at 02:33:34PM +0300, Vitaliy Makkoveev wrote:
> On Tue, Apr 16, 2024 at 01:36:40PM +0300, Vitaliy Makkoveev wrote:
> > Make the reception locking fine grained too. pru_rcvd() requires
> > solock(), so take it around. I don't know is the unlocked connection
> > state SS_ bits check safe, so take `sb_mtx' around corresponding
> > `so_state' modifications. If this serialization is not required, it
> > could be reworked with another diff later.
> >
> > So, the reception of all the inet sockets moved out of netlock.
> >
>
> Sorry, this is the right diff.
With this diff my i386 regress machine crashed. It was somewhere
in regress/usr.sbin/syslogd.
==== run-args-client-tls-tcp.pl ====
time SUDO= KTRACE= SYSLOGD= perl -I/usr/src/regress/usr.sbin/syslogd -I/usr/src/regress/usr.sbin/syslogd/obj /usr/src/regress/usr.sbin/syslogd/syslogd.pl /usr/src/regress/usr.sbin/syslogd/args-client-tls-tcp.pl
Timeout, server ot1 not responding.
There are many messages:
splassert: sbmtxassertlocked: want 0 have 1
Starting stack trace...
sbdrop(0,1,d0cf1c14) at sbdrop+0x53
sbdrop(f67cbb84,f67cbbf0,17) at sbdrop+0x53
sbflush(f67cbb84,f67cbbf0) at sbflush+0x40
tcp_dodisconnect(d8af86f4) at tcp_dodisconnect+0x47
tcp_disconnect(f67cbb84) at tcp_disconnect+0x2b
soclose(f67cbb84,0) at soclose+0x6a
soo_close(d7ac2074,f60ed1c0) at soo_close+0x1e
fdrop(d7ac2074,f60ed1c0) at fdrop+0x6a
closef(d7ac2074,f60ed1c0) at closef+0x9d
fdrelease(f60ed1c0,3) at fdrelease+0x7a
sys_close(f60ed1c0,f6b00ce0,f6b00cd8) at sys_close+0x54
syscall(f6b00d20) at syscall+0x41d
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel
End of stack trace.
Final output:
splassert: sbmtxassertlocked: want 0 have 1
Starting stack trace...
sbdrop(0,1,d0cf1c14) at sbdrop+0x53
sbdrop(d7706740,d77067ac,18) at sbdrop+0x53
sbflush(d7706740,d77067ac) at sbflush+0x40
tcp_dodisconnect(d88858c0) at tcp_dodisconnect+0x47
tcp_disconnect(d7706740) at tcp_disconnect+0x2b
soclose(d7706740,0) at soclose+0x6a
soo_close(d782f700,d7a0cb20) at soo_close+0x1e
fdrop(d782f700,d7a0cb20) at fdrop+0x6a
closef(d782f700,d7a0cb20) at closef+0x9d
fdfree(d7a0cb20) at fdfree+0x5c
exit1(d7a0cb20,0,0,1) at exit1+0x1cc
sys_exit(d7a0cb20,f6014c30,f6014c28) at sys_exit+0x14
syscall(f6014c70) at syscall+0x3f4
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel
End of stack trace.
ppaniacn:i c : u v m _ f a u l t ( 0 x f8e 3 c 0 6 0, 0x0 , 0 , 1) - > e
Stopped at db_enter+0x4: popl %ebp
TID PID UID PRFLAGS PFLAGS CPU COMMAND
63476 68916 73 0x19100010 0 3 syslogd
*144380 89910 0 0x18000002 0 2K ttylog
db_enter() at db_enter+0x4
panic(d0c61c69) at panic+0x7a
kpageflttrap(f5fdaeec,0) at kpageflttrap+0x137
trap(f5fdaeec) at trap+0x25f
calltrap() at calltrap+0xc
docopyf() at docopyf+0x5
ptcread(600,f5fdb100,0) at ptcread+0x1a7
spec_read(f5fdb070) at spec_read+0x80
VOP_READ(d6ba2c00,f5fdb100,0,d7b879c4) at VOP_READ+0x3c
vn_read(d76fd774,f5fdb100,0) at vn_read+0x95
dofilereadv(d7a0cb20,5,f5fdb100,0,f5fdb188) at dofilereadv+0xb5
sys_read(d7a0cb20,f5fdb190,f5fdb188) at sys_read+0x49
syscall(f5fdb1d0) at syscall+0x41d
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb{2}> [-- MARK -- Wed Apr 17 03:45:00 2024]
ddb{2}> show panic
*cpu3: uvm_fault(0xf8e3c154, 0x0, 0, 2) -> e
cpu2: uvm_fault(0xf8e3c060, 0x0, 0, 1) -> e
ddb{2}> trace
db_enter() at db_enter+0x5
kpageflttrap(f5fdaeec,0) at kpageflttrap+0x137
trap(f5fdaeec) at trap+0x25f
calltrap() at calltrap+0xc
docopyf() at docopyf+0x5
ptcread(600,f5fdb100,0) at ptcread+0x1a7
spec_read(f5fdb070) at spec_read+0x80
VOP_READ(d6ba2c00,f5fdb100,0,d7b879c4) at VOP_READ+0x3c
vn_read(d76fd774,f5fdb100,0) at vn_read+0x95
dofilereadv(d7a0cb20,5,f5fdb100,0,f5fdb188) at dofilereadv+0xb5
sys_read(d7a0cb20,f5fdb190,f5fdb188) at sys_read+0x49
syscall(f5fdb1d0) at syscall+0x41d
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel
ddb{2}> show register
ds 0x10
es 0x10
fs 0x20
gs 0
edi 0xd0c61c69 mtl_rcs_offsets+0x163ee
esi 0xf55435d8
ebp 0xf5fdae80
ebx 0xf553d5d8
edx 0x3fd
ecx 0xc52ed985
eax 0x2d
eip 0xd064f575 db_enter+0x5
cs 0x8
eflags 0x202
esp 0xf5fdae68
ss 0x10
db_enter+0x5: ret
ddb{2}> mach ddbcpu 3
Stopped at db_enter+0x4: popl %ebp
db_enter() at db_enter+0x4
i386_ipi_handler() at i386_ipi_handler+0x3f
Xipi_untramp() at Xipi_untramp+0xc2
i386_bus_space_io_write_1(3f8,0,20) at i386_bus_space_io_write_1+0xe
comcnputc(800,20) at comcnputc+0xb3
cnputc(20) at cnputc+0x21
db_putchar(75) at db_putchar+0x277
kprintf() at kprintf+0xecd
db_printf(d0cbd686) at db_printf+0x37
panic(d0c61c69) at panic+0x69
kpageflttrap(f605200c,0) at kpageflttrap+0x137
trap(f605200c) at trap+0x25f
calltrap() at calltrap+0xc
putc(30,d7fd5648) at putc+0x3b
ddb{3}>
ddb{3}> show register
ds 0x10
es 0x10
fs 0x20
gs 0
edi 0
esi 0xf5542ff8
ebp 0xf6051e28
ebx 0x6
edx 0x3f8
ecx 0x6
eax 0xd02ef560 i386_ipi_db
eip 0xd064f574 db_enter+0x4
cs 0x50
eflags 0x200246 __kernel_base_phys+0x172
esp 0xf6051e28
ss 0x10
db_enter+0x4: popl %ebp
ddb{3}> ps
PID TID PPID UID S FLAGS WAIT COMMAND
66495 162740 92179 0 3 0x18000080 nanoslp perl
6874 83487 66761 73 3 0x8100082 sbwait dd
66761 388162 9478 73 3 0x810008a sigsusp sh
*68916 63476 9478 73 7 0x19100010 syslogd
54422 376604 92179 0 3 0x18000080 nanoslp perl
9478 184642 92179 0 3 0x18100082 sbwait syslogd
47187 273616 92179 0 3 0x18000082 ttyin ttylog
89910 144380 92179 0 7 0x18000002 ttylog
92179 241768 65780 0 3 0x18000082 nanoslp perl
65780 244989 79759 0 3 0x810008a sigsusp sh
79759 19418 13992 0 3 0x1810008a sigsusp make
13992 499689 49023 0 3 0x810008a sigsusp sh
49023 344293 81972 0 3 0x1810008a sigsusp make
82437 262496 48222 91 3 0x18000092 kqread snmpd_metrics
83156 445484 48222 91 3 0x19100092 kqread snmpd
48222 385347 1 0 3 0x18100080 kqread snmpd
28484 337167 0 0 3 0x14200 bored sosplice
10920 105338 89774 0 3 0x18100082 piperd gzip
89774 101197 81972 0 3 0x8100082 piperd pax
81972 103305 58227 0 3 0x18000082 piperd perl
58227 19639 14807 0 3 0x810008a sigsusp ksh
14807 478472 91506 0 3 0x1800009a kqread sshd
22912 97132 1 0 3 0x18100083 ttyin getty
59734 144669 1 0 3 0x18100083 ttyin getty
88912 109485 1 0 3 0x18100083 ttyin getty
48436 497769 1 0 3 0x18100083 ttyin getty
4458 357659 1 0 3 0x18100083 ttyin getty
37567 168856 1 0 3 0x18100083 ttyin getty
5132 165217 1 0 3 0x18100098 kqread cron
69936 89460 1 99 3 0x19100090 kqread sndiod
88445 311622 1 110 3 0x18100090 kqread sndiod
87160 271303 56343 95 3 0x19100092 kqread smtpd
24080 130340 56343 103 3 0x19100092 kqread smtpd
16999 297795 56343 95 3 0x19100092 kqread smtpd
16674 305538 56343 95 3 0x18100092 kqread smtpd
10354 128758 56343 95 3 0x19100092 kqread smtpd
83399 429392 56343 95 3 0x19100092 kqread smtpd
56343 19986 1 0 3 0x18100080 kqread smtpd
91506 514211 1 0 3 0x18000088 kqread sshd
47560 286908 0 0 3 0x14200 acct acct
6834 234040 0 0 3 0x14280 nfsidl nfsio
9034 244410 0 0 3 0x14280 nfsidl nfsio
19584 122196 0 0 3 0x14280 nfsidl nfsio
47951 110834 0 0 3 0x14280 nfsidl nfsio
4406 311068 1 0 3 0x18100080 kqread ntpd
16297 155892 89542 83 3 0x18100092 kqread ntpd
89542 25530 1 83 3 0x19100092 kqread ntpd
62107 309092 5720 74 3 0x19100092 bpf pflogd
5720 510680 1 0 3 0x18000080 sbwait pflogd
3891 404875 22292 77 3 0x18100092 kqread dhcpleased
6467 50846 22292 77 3 0x18100092 kqread dhcpleased
22292 304417 1 0 3 0x18000080 kqread dhcpleased
97699 95954 93975 115 3 0x18100092 kqread slaacd
75700 397979 93975 115 3 0x18100092 kqread slaacd
93975 188749 1 0 3 0x18100080 kqread slaacd
97739 487380 0 0 3 0x14200 bored smr
53597 490227 0 0 3 0x14200 pgzero zerothread
51391 230609 0 0 3 0x14200 aiodoned aiodoned
20123 257987 0 0 3 0x14200 syncer update
69 169525 0 0 3 0x14200 cleaner cleaner
81897 226934 0 0 3 0x14200 reaper reaper
17130 96600 0 0 3 0x14200 pgdaemon pagedaemon
80528 104662 0 0 3 0x14200 bored wsdisplay0
66418 121964 0 0 3 0x14200 usbtsk usbtask
81259 113739 0 0 3 0x14200 usbatsk usbatsk
55149 274075 0 0 3 0x14200 bored sensors
52509 425472 0 0 3 0x40014200 acpi0 acpi0
99716 322609 0 0 7 0x40014200 idle7
13816 255518 0 0 7 0x40014200 idle6
19438 516832 0 0 7 0x40014200 idle5
12990 165070 0 0 7 0x40014200 idle4
94588 16061 0 0 3 0x40014200 idle3
24348 51863 0 0 3 0x40014200 idle2
15779 458677 0 0 7 0x40014200 idle1
86291 273017 0 0 3 0x14200 bored softnet3
42226 170805 0 0 3 0x14200 bored softnet2
59793 424398 0 0 3 0x14200 bored softnet1
15121 321698 0 0 3 0x14200 bored softnet0
96660 14196 0 0 3 0x14200 bored systqmp
1086 472575 0 0 3 0x14200 bored systq
98808 168088 0 0 3 0x14200 tmoslp softclockmp
37114 12489 0 0 3 0x40014200 tmoslp softclock
34875 214046 0 0 7 0x40014200 idle0
58685 477216 0 0 3 0x14200 kmalloc kmthread
1 54597 0 0 3 0x8000082 wait init
0 0 -1 0 3 0x10200 scheduler swapper
ddb{3}> show uvm
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
736533 VM pages: 17336 active, 89088 inactive, 1 wired, 455624 free (56963 zero)
min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
freemin=24551, free-target=32734, inactive-target=0, wired-max=245511
faults=324856630, traps=352534994, intrs=3454558, ctxswitch=170378607 fpuswitch=2308330
softint=30748978, syscalls=760351157, kmapent=535
fault counts:
noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
ok relocks(total)=361573(373193), anget(retries)=146838497(0), amapcopy=73104379
neighbor anon/obj pg=34140648/115322218, gets(lock/unlock)=32462604/374403
cases: anon=133497967, anoncow=13340530, obj=28897645, prcopy=3552128, przero=145087858
daemon and swap counts:
woke=0, revs=0, scans=0, obscans=0, anscans=0
busy=0, freed=0, reactivate=0, deactivate=0
pageouts=0, pending=0, nswget=0
nswapdev=1
swpages=824403, swpginuse=0, swpgonly=0 paging=0
kernel pointers:
objs(kern)=0xd0fd431c
ttylog is a program used by syslogd regress to redirect console.
I need this to test syslogd console logging. My suspicion is that
there is a race in kernel console logging code and all these splassert
warnings triggered the crash.
bluhm
Don't take solock() in soreceive() for tcp(4) sockets