From: Alexander Bluhm Subject: Re: Don't take solock() in soreceive() for tcp(4) sockets To: Vitaliy Makkoveev Cc: tech@openbsd.org Date: Wed, 17 Apr 2024 14:19:42 +0200 On Tue, Apr 16, 2024 at 02:33:34PM +0300, Vitaliy Makkoveev wrote: > On Tue, Apr 16, 2024 at 01:36:40PM +0300, Vitaliy Makkoveev wrote: > > Make the reception locking fine grained too. pru_rcvd() requires > > solock(), so take it around. I don't know is the unlocked connection > > state SS_ bits check safe, so take `sb_mtx' around corresponding > > `so_state' modifications. If this serialization is not required, it > > could be reworked with another diff later. > > > > So, the reception of all the inet sockets moved out of netlock. > > > > Sorry, this is the right diff. With this diff my i386 regress machine crashed. It was somewhere in regress/usr.sbin/syslogd. ==== run-args-client-tls-tcp.pl ==== time SUDO= KTRACE= SYSLOGD= perl -I/usr/src/regress/usr.sbin/syslogd -I/usr/src/regress/usr.sbin/syslogd/obj /usr/src/regress/usr.sbin/syslogd/syslogd.pl /usr/src/regress/usr.sbin/syslogd/args-client-tls-tcp.pl Timeout, server ot1 not responding. There are many messages: splassert: sbmtxassertlocked: want 0 have 1 Starting stack trace... sbdrop(0,1,d0cf1c14) at sbdrop+0x53 sbdrop(f67cbb84,f67cbbf0,17) at sbdrop+0x53 sbflush(f67cbb84,f67cbbf0) at sbflush+0x40 tcp_dodisconnect(d8af86f4) at tcp_dodisconnect+0x47 tcp_disconnect(f67cbb84) at tcp_disconnect+0x2b soclose(f67cbb84,0) at soclose+0x6a soo_close(d7ac2074,f60ed1c0) at soo_close+0x1e fdrop(d7ac2074,f60ed1c0) at fdrop+0x6a closef(d7ac2074,f60ed1c0) at closef+0x9d fdrelease(f60ed1c0,3) at fdrelease+0x7a sys_close(f60ed1c0,f6b00ce0,f6b00cd8) at sys_close+0x54 syscall(f6b00d20) at syscall+0x41d Xsyscall_untramp() at Xsyscall_untramp+0xa9 end of kernel End of stack trace. Final output: splassert: sbmtxassertlocked: want 0 have 1 Starting stack trace... sbdrop(0,1,d0cf1c14) at sbdrop+0x53 sbdrop(d7706740,d77067ac,18) at sbdrop+0x53 sbflush(d7706740,d77067ac) at sbflush+0x40 tcp_dodisconnect(d88858c0) at tcp_dodisconnect+0x47 tcp_disconnect(d7706740) at tcp_disconnect+0x2b soclose(d7706740,0) at soclose+0x6a soo_close(d782f700,d7a0cb20) at soo_close+0x1e fdrop(d782f700,d7a0cb20) at fdrop+0x6a closef(d782f700,d7a0cb20) at closef+0x9d fdfree(d7a0cb20) at fdfree+0x5c exit1(d7a0cb20,0,0,1) at exit1+0x1cc sys_exit(d7a0cb20,f6014c30,f6014c28) at sys_exit+0x14 syscall(f6014c70) at syscall+0x3f4 Xsyscall_untramp() at Xsyscall_untramp+0xa9 end of kernel End of stack trace. ppaniacn:i c : u v m _ f a u l t ( 0 x f8e 3 c 0 6 0, 0x0 , 0 , 1) - > e Stopped at db_enter+0x4: popl %ebp TID PID UID PRFLAGS PFLAGS CPU COMMAND 63476 68916 73 0x19100010 0 3 syslogd *144380 89910 0 0x18000002 0 2K ttylog db_enter() at db_enter+0x4 panic(d0c61c69) at panic+0x7a kpageflttrap(f5fdaeec,0) at kpageflttrap+0x137 trap(f5fdaeec) at trap+0x25f calltrap() at calltrap+0xc docopyf() at docopyf+0x5 ptcread(600,f5fdb100,0) at ptcread+0x1a7 spec_read(f5fdb070) at spec_read+0x80 VOP_READ(d6ba2c00,f5fdb100,0,d7b879c4) at VOP_READ+0x3c vn_read(d76fd774,f5fdb100,0) at vn_read+0x95 dofilereadv(d7a0cb20,5,f5fdb100,0,f5fdb188) at dofilereadv+0xb5 sys_read(d7a0cb20,f5fdb190,f5fdb188) at sys_read+0x49 syscall(f5fdb1d0) at syscall+0x41d Xsyscall_untramp() at Xsyscall_untramp+0xa9 end of kernel https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{2}> [-- MARK -- Wed Apr 17 03:45:00 2024] ddb{2}> show panic *cpu3: uvm_fault(0xf8e3c154, 0x0, 0, 2) -> e cpu2: uvm_fault(0xf8e3c060, 0x0, 0, 1) -> e ddb{2}> trace db_enter() at db_enter+0x5 kpageflttrap(f5fdaeec,0) at kpageflttrap+0x137 trap(f5fdaeec) at trap+0x25f calltrap() at calltrap+0xc docopyf() at docopyf+0x5 ptcread(600,f5fdb100,0) at ptcread+0x1a7 spec_read(f5fdb070) at spec_read+0x80 VOP_READ(d6ba2c00,f5fdb100,0,d7b879c4) at VOP_READ+0x3c vn_read(d76fd774,f5fdb100,0) at vn_read+0x95 dofilereadv(d7a0cb20,5,f5fdb100,0,f5fdb188) at dofilereadv+0xb5 sys_read(d7a0cb20,f5fdb190,f5fdb188) at sys_read+0x49 syscall(f5fdb1d0) at syscall+0x41d Xsyscall_untramp() at Xsyscall_untramp+0xa9 end of kernel ddb{2}> show register ds 0x10 es 0x10 fs 0x20 gs 0 edi 0xd0c61c69 mtl_rcs_offsets+0x163ee esi 0xf55435d8 ebp 0xf5fdae80 ebx 0xf553d5d8 edx 0x3fd ecx 0xc52ed985 eax 0x2d eip 0xd064f575 db_enter+0x5 cs 0x8 eflags 0x202 esp 0xf5fdae68 ss 0x10 db_enter+0x5: ret ddb{2}> mach ddbcpu 3 Stopped at db_enter+0x4: popl %ebp db_enter() at db_enter+0x4 i386_ipi_handler() at i386_ipi_handler+0x3f Xipi_untramp() at Xipi_untramp+0xc2 i386_bus_space_io_write_1(3f8,0,20) at i386_bus_space_io_write_1+0xe comcnputc(800,20) at comcnputc+0xb3 cnputc(20) at cnputc+0x21 db_putchar(75) at db_putchar+0x277 kprintf() at kprintf+0xecd db_printf(d0cbd686) at db_printf+0x37 panic(d0c61c69) at panic+0x69 kpageflttrap(f605200c,0) at kpageflttrap+0x137 trap(f605200c) at trap+0x25f calltrap() at calltrap+0xc putc(30,d7fd5648) at putc+0x3b ddb{3}> ddb{3}> show register ds 0x10 es 0x10 fs 0x20 gs 0 edi 0 esi 0xf5542ff8 ebp 0xf6051e28 ebx 0x6 edx 0x3f8 ecx 0x6 eax 0xd02ef560 i386_ipi_db eip 0xd064f574 db_enter+0x4 cs 0x50 eflags 0x200246 __kernel_base_phys+0x172 esp 0xf6051e28 ss 0x10 db_enter+0x4: popl %ebp ddb{3}> ps PID TID PPID UID S FLAGS WAIT COMMAND 66495 162740 92179 0 3 0x18000080 nanoslp perl 6874 83487 66761 73 3 0x8100082 sbwait dd 66761 388162 9478 73 3 0x810008a sigsusp sh *68916 63476 9478 73 7 0x19100010 syslogd 54422 376604 92179 0 3 0x18000080 nanoslp perl 9478 184642 92179 0 3 0x18100082 sbwait syslogd 47187 273616 92179 0 3 0x18000082 ttyin ttylog 89910 144380 92179 0 7 0x18000002 ttylog 92179 241768 65780 0 3 0x18000082 nanoslp perl 65780 244989 79759 0 3 0x810008a sigsusp sh 79759 19418 13992 0 3 0x1810008a sigsusp make 13992 499689 49023 0 3 0x810008a sigsusp sh 49023 344293 81972 0 3 0x1810008a sigsusp make 82437 262496 48222 91 3 0x18000092 kqread snmpd_metrics 83156 445484 48222 91 3 0x19100092 kqread snmpd 48222 385347 1 0 3 0x18100080 kqread snmpd 28484 337167 0 0 3 0x14200 bored sosplice 10920 105338 89774 0 3 0x18100082 piperd gzip 89774 101197 81972 0 3 0x8100082 piperd pax 81972 103305 58227 0 3 0x18000082 piperd perl 58227 19639 14807 0 3 0x810008a sigsusp ksh 14807 478472 91506 0 3 0x1800009a kqread sshd 22912 97132 1 0 3 0x18100083 ttyin getty 59734 144669 1 0 3 0x18100083 ttyin getty 88912 109485 1 0 3 0x18100083 ttyin getty 48436 497769 1 0 3 0x18100083 ttyin getty 4458 357659 1 0 3 0x18100083 ttyin getty 37567 168856 1 0 3 0x18100083 ttyin getty 5132 165217 1 0 3 0x18100098 kqread cron 69936 89460 1 99 3 0x19100090 kqread sndiod 88445 311622 1 110 3 0x18100090 kqread sndiod 87160 271303 56343 95 3 0x19100092 kqread smtpd 24080 130340 56343 103 3 0x19100092 kqread smtpd 16999 297795 56343 95 3 0x19100092 kqread smtpd 16674 305538 56343 95 3 0x18100092 kqread smtpd 10354 128758 56343 95 3 0x19100092 kqread smtpd 83399 429392 56343 95 3 0x19100092 kqread smtpd 56343 19986 1 0 3 0x18100080 kqread smtpd 91506 514211 1 0 3 0x18000088 kqread sshd 47560 286908 0 0 3 0x14200 acct acct 6834 234040 0 0 3 0x14280 nfsidl nfsio 9034 244410 0 0 3 0x14280 nfsidl nfsio 19584 122196 0 0 3 0x14280 nfsidl nfsio 47951 110834 0 0 3 0x14280 nfsidl nfsio 4406 311068 1 0 3 0x18100080 kqread ntpd 16297 155892 89542 83 3 0x18100092 kqread ntpd 89542 25530 1 83 3 0x19100092 kqread ntpd 62107 309092 5720 74 3 0x19100092 bpf pflogd 5720 510680 1 0 3 0x18000080 sbwait pflogd 3891 404875 22292 77 3 0x18100092 kqread dhcpleased 6467 50846 22292 77 3 0x18100092 kqread dhcpleased 22292 304417 1 0 3 0x18000080 kqread dhcpleased 97699 95954 93975 115 3 0x18100092 kqread slaacd 75700 397979 93975 115 3 0x18100092 kqread slaacd 93975 188749 1 0 3 0x18100080 kqread slaacd 97739 487380 0 0 3 0x14200 bored smr 53597 490227 0 0 3 0x14200 pgzero zerothread 51391 230609 0 0 3 0x14200 aiodoned aiodoned 20123 257987 0 0 3 0x14200 syncer update 69 169525 0 0 3 0x14200 cleaner cleaner 81897 226934 0 0 3 0x14200 reaper reaper 17130 96600 0 0 3 0x14200 pgdaemon pagedaemon 80528 104662 0 0 3 0x14200 bored wsdisplay0 66418 121964 0 0 3 0x14200 usbtsk usbtask 81259 113739 0 0 3 0x14200 usbatsk usbatsk 55149 274075 0 0 3 0x14200 bored sensors 52509 425472 0 0 3 0x40014200 acpi0 acpi0 99716 322609 0 0 7 0x40014200 idle7 13816 255518 0 0 7 0x40014200 idle6 19438 516832 0 0 7 0x40014200 idle5 12990 165070 0 0 7 0x40014200 idle4 94588 16061 0 0 3 0x40014200 idle3 24348 51863 0 0 3 0x40014200 idle2 15779 458677 0 0 7 0x40014200 idle1 86291 273017 0 0 3 0x14200 bored softnet3 42226 170805 0 0 3 0x14200 bored softnet2 59793 424398 0 0 3 0x14200 bored softnet1 15121 321698 0 0 3 0x14200 bored softnet0 96660 14196 0 0 3 0x14200 bored systqmp 1086 472575 0 0 3 0x14200 bored systq 98808 168088 0 0 3 0x14200 tmoslp softclockmp 37114 12489 0 0 3 0x40014200 tmoslp softclock 34875 214046 0 0 7 0x40014200 idle0 58685 477216 0 0 3 0x14200 kmalloc kmthread 1 54597 0 0 3 0x8000082 wait init 0 0 -1 0 3 0x10200 scheduler swapper ddb{3}> show uvm Current UVM status: pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12 736533 VM pages: 17336 active, 89088 inactive, 1 wired, 455624 free (56963 zero) min 10% (25) anon, 10% (25) vnode, 5% (12) vtext freemin=24551, free-target=32734, inactive-target=0, wired-max=245511 faults=324856630, traps=352534994, intrs=3454558, ctxswitch=170378607 fpuswitch=2308330 softint=30748978, syscalls=760351157, kmapent=535 fault counts: noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0 ok relocks(total)=361573(373193), anget(retries)=146838497(0), amapcopy=73104379 neighbor anon/obj pg=34140648/115322218, gets(lock/unlock)=32462604/374403 cases: anon=133497967, anoncow=13340530, obj=28897645, prcopy=3552128, przero=145087858 daemon and swap counts: woke=0, revs=0, scans=0, obscans=0, anscans=0 busy=0, freed=0, reactivate=0, deactivate=0 pageouts=0, pending=0, nswget=0 nswapdev=1 swpages=824403, swpginuse=0, swpgonly=0 paging=0 kernel pointers: objs(kern)=0xd0fd431c ttylog is a program used by syslogd regress to redirect console. I need this to test syslogd console logging. My suspicion is that there is a race in kernel console logging code and all these splassert warnings triggered the crash. bluhm