Index | Thread | Search

From:
Alexander Bluhm <bluhm@openbsd.org>
Subject:
Re: Don't take solock() in soreceive() for tcp(4) sockets
To:
Vitaliy Makkoveev <mvs@openbsd.org>
Cc:
tech@openbsd.org
Date:
Wed, 17 Apr 2024 14:19:42 +0200

Download raw body.

Thread
On Tue, Apr 16, 2024 at 02:33:34PM +0300, Vitaliy Makkoveev wrote:
> On Tue, Apr 16, 2024 at 01:36:40PM +0300, Vitaliy Makkoveev wrote:
> > Make the reception locking fine grained too. pru_rcvd() requires
> > solock(), so take it around. I don't know is the unlocked connection
> > state SS_ bits check safe, so take `sb_mtx' around corresponding
> > `so_state' modifications. If this serialization is not required, it
> > could be reworked with another diff later.
> > 
> > So, the reception of all the inet sockets moved out of netlock.
> > 
> 
> Sorry, this is the right diff.

With this diff my i386 regress machine crashed.  It was somewhere
in regress/usr.sbin/syslogd.

==== run-args-client-tls-tcp.pl ====
time SUDO= KTRACE= SYSLOGD= perl -I/usr/src/regress/usr.sbin/syslogd -I/usr/src/regress/usr.sbin/syslogd/obj /usr/src/regress/usr.sbin/syslogd/syslogd.pl /usr/src/regress/usr.sbin/syslogd/args-client-tls-tcp.pl
Timeout, server ot1 not responding.

There are many messages:

splassert: sbmtxassertlocked: want 0 have 1
Starting stack trace...
sbdrop(0,1,d0cf1c14) at sbdrop+0x53
sbdrop(f67cbb84,f67cbbf0,17) at sbdrop+0x53
sbflush(f67cbb84,f67cbbf0) at sbflush+0x40
tcp_dodisconnect(d8af86f4) at tcp_dodisconnect+0x47
tcp_disconnect(f67cbb84) at tcp_disconnect+0x2b
soclose(f67cbb84,0) at soclose+0x6a
soo_close(d7ac2074,f60ed1c0) at soo_close+0x1e
fdrop(d7ac2074,f60ed1c0) at fdrop+0x6a
closef(d7ac2074,f60ed1c0) at closef+0x9d
fdrelease(f60ed1c0,3) at fdrelease+0x7a
sys_close(f60ed1c0,f6b00ce0,f6b00cd8) at sys_close+0x54
syscall(f6b00d20) at syscall+0x41d
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel
End of stack trace.

Final output:

splassert: sbmtxassertlocked: want 0 have 1
Starting stack trace...
sbdrop(0,1,d0cf1c14) at sbdrop+0x53
sbdrop(d7706740,d77067ac,18) at sbdrop+0x53
sbflush(d7706740,d77067ac) at sbflush+0x40
tcp_dodisconnect(d88858c0) at tcp_dodisconnect+0x47
tcp_disconnect(d7706740) at tcp_disconnect+0x2b
soclose(d7706740,0) at soclose+0x6a
soo_close(d782f700,d7a0cb20) at soo_close+0x1e
fdrop(d782f700,d7a0cb20) at fdrop+0x6a
closef(d782f700,d7a0cb20) at closef+0x9d
fdfree(d7a0cb20) at fdfree+0x5c
exit1(d7a0cb20,0,0,1) at exit1+0x1cc
sys_exit(d7a0cb20,f6014c30,f6014c28) at sys_exit+0x14
syscall(f6014c70) at syscall+0x3f4
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel
End of stack trace.
ppaniacn:i c : u  v m _ f a u l t ( 0 x f8e 3 c 0 6    0, 0x0 , 0 ,   1)  - >   e
 Stopped at      db_enter+0x4:   popl    %ebp
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
  63476  68916     73  0x19100010          0    3  syslogd
*144380  89910      0  0x18000002          0    2K ttylog
db_enter() at db_enter+0x4
panic(d0c61c69) at panic+0x7a
kpageflttrap(f5fdaeec,0) at kpageflttrap+0x137
trap(f5fdaeec) at trap+0x25f
calltrap() at calltrap+0xc
docopyf() at docopyf+0x5
ptcread(600,f5fdb100,0) at ptcread+0x1a7
spec_read(f5fdb070) at spec_read+0x80
VOP_READ(d6ba2c00,f5fdb100,0,d7b879c4) at VOP_READ+0x3c
vn_read(d76fd774,f5fdb100,0) at vn_read+0x95
dofilereadv(d7a0cb20,5,f5fdb100,0,f5fdb188) at dofilereadv+0xb5
sys_read(d7a0cb20,f5fdb190,f5fdb188) at sys_read+0x49
syscall(f5fdb1d0) at syscall+0x41d
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{2}> [-- MARK -- Wed Apr 17 03:45:00 2024]

ddb{2}> show panic
*cpu3: uvm_fault(0xf8e3c154, 0x0, 0, 2) -> e
 cpu2: uvm_fault(0xf8e3c060, 0x0, 0, 1) -> e

ddb{2}> trace
db_enter() at db_enter+0x5
kpageflttrap(f5fdaeec,0) at kpageflttrap+0x137
trap(f5fdaeec) at trap+0x25f
calltrap() at calltrap+0xc
docopyf() at docopyf+0x5
ptcread(600,f5fdb100,0) at ptcread+0x1a7
spec_read(f5fdb070) at spec_read+0x80
VOP_READ(d6ba2c00,f5fdb100,0,d7b879c4) at VOP_READ+0x3c
vn_read(d76fd774,f5fdb100,0) at vn_read+0x95
dofilereadv(d7a0cb20,5,f5fdb100,0,f5fdb188) at dofilereadv+0xb5
sys_read(d7a0cb20,f5fdb190,f5fdb188) at sys_read+0x49
syscall(f5fdb1d0) at syscall+0x41d
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel

ddb{2}> show register
ds                  0x10
es                  0x10
fs                  0x20
gs                     0
edi           0xd0c61c69        mtl_rcs_offsets+0x163ee
esi           0xf55435d8
ebp           0xf5fdae80
ebx           0xf553d5d8
edx                0x3fd
ecx           0xc52ed985
eax                 0x2d
eip           0xd064f575        db_enter+0x5
cs                   0x8
eflags             0x202
esp           0xf5fdae68
ss                  0x10
db_enter+0x5:   ret

ddb{2}> mach ddbcpu 3
Stopped at      db_enter+0x4:   popl    %ebp
db_enter() at db_enter+0x4
i386_ipi_handler() at i386_ipi_handler+0x3f
Xipi_untramp() at Xipi_untramp+0xc2
i386_bus_space_io_write_1(3f8,0,20) at i386_bus_space_io_write_1+0xe
comcnputc(800,20) at comcnputc+0xb3
cnputc(20) at cnputc+0x21
db_putchar(75) at db_putchar+0x277
kprintf() at kprintf+0xecd
db_printf(d0cbd686) at db_printf+0x37
panic(d0c61c69) at panic+0x69
kpageflttrap(f605200c,0) at kpageflttrap+0x137
trap(f605200c) at trap+0x25f
calltrap() at calltrap+0xc
putc(30,d7fd5648) at putc+0x3b
ddb{3}>

ddb{3}> show register
ds                  0x10
es                  0x10
fs                  0x20
gs                     0
edi                    0
esi           0xf5542ff8
ebp           0xf6051e28
ebx                  0x6
edx                0x3f8
ecx                  0x6
eax           0xd02ef560        i386_ipi_db
eip           0xd064f574        db_enter+0x4
cs                  0x50
eflags          0x200246        __kernel_base_phys+0x172
esp           0xf6051e28
ss                  0x10
db_enter+0x4:   popl    %ebp

ddb{3}> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 66495  162740  92179      0  3  0x18000080  nanoslp       perl
  6874   83487  66761     73  3   0x8100082  sbwait        dd
 66761  388162   9478     73  3   0x810008a  sigsusp       sh
*68916   63476   9478     73  7  0x19100010                syslogd
 54422  376604  92179      0  3  0x18000080  nanoslp       perl
  9478  184642  92179      0  3  0x18100082  sbwait        syslogd
 47187  273616  92179      0  3  0x18000082  ttyin         ttylog
 89910  144380  92179      0  7  0x18000002                ttylog
 92179  241768  65780      0  3  0x18000082  nanoslp       perl
 65780  244989  79759      0  3   0x810008a  sigsusp       sh
 79759   19418  13992      0  3  0x1810008a  sigsusp       make
 13992  499689  49023      0  3   0x810008a  sigsusp       sh
 49023  344293  81972      0  3  0x1810008a  sigsusp       make
 82437  262496  48222     91  3  0x18000092  kqread        snmpd_metrics
 83156  445484  48222     91  3  0x19100092  kqread        snmpd
 48222  385347      1      0  3  0x18100080  kqread        snmpd
 28484  337167      0      0  3     0x14200  bored         sosplice
 10920  105338  89774      0  3  0x18100082  piperd        gzip
 89774  101197  81972      0  3   0x8100082  piperd        pax
 81972  103305  58227      0  3  0x18000082  piperd        perl
 58227   19639  14807      0  3   0x810008a  sigsusp       ksh
 14807  478472  91506      0  3  0x1800009a  kqread        sshd
 22912   97132      1      0  3  0x18100083  ttyin         getty
 59734  144669      1      0  3  0x18100083  ttyin         getty
 88912  109485      1      0  3  0x18100083  ttyin         getty
 48436  497769      1      0  3  0x18100083  ttyin         getty
  4458  357659      1      0  3  0x18100083  ttyin         getty
 37567  168856      1      0  3  0x18100083  ttyin         getty
  5132  165217      1      0  3  0x18100098  kqread        cron
 69936   89460      1     99  3  0x19100090  kqread        sndiod
 88445  311622      1    110  3  0x18100090  kqread        sndiod
 87160  271303  56343     95  3  0x19100092  kqread        smtpd
 24080  130340  56343    103  3  0x19100092  kqread        smtpd
 16999  297795  56343     95  3  0x19100092  kqread        smtpd
 16674  305538  56343     95  3  0x18100092  kqread        smtpd
 10354  128758  56343     95  3  0x19100092  kqread        smtpd
 83399  429392  56343     95  3  0x19100092  kqread        smtpd
 56343   19986      1      0  3  0x18100080  kqread        smtpd
 91506  514211      1      0  3  0x18000088  kqread        sshd
 47560  286908      0      0  3     0x14200  acct          acct
  6834  234040      0      0  3     0x14280  nfsidl        nfsio
  9034  244410      0      0  3     0x14280  nfsidl        nfsio
 19584  122196      0      0  3     0x14280  nfsidl        nfsio
 47951  110834      0      0  3     0x14280  nfsidl        nfsio
  4406  311068      1      0  3  0x18100080  kqread        ntpd
 16297  155892  89542     83  3  0x18100092  kqread        ntpd
 89542   25530      1     83  3  0x19100092  kqread        ntpd
 62107  309092   5720     74  3  0x19100092  bpf           pflogd
  5720  510680      1      0  3  0x18000080  sbwait        pflogd
  3891  404875  22292     77  3  0x18100092  kqread        dhcpleased
  6467   50846  22292     77  3  0x18100092  kqread        dhcpleased
 22292  304417      1      0  3  0x18000080  kqread        dhcpleased
 97699   95954  93975    115  3  0x18100092  kqread        slaacd
 75700  397979  93975    115  3  0x18100092  kqread        slaacd
 93975  188749      1      0  3  0x18100080  kqread        slaacd
 97739  487380      0      0  3     0x14200  bored         smr
 53597  490227      0      0  3     0x14200  pgzero        zerothread
 51391  230609      0      0  3     0x14200  aiodoned      aiodoned
 20123  257987      0      0  3     0x14200  syncer        update
    69  169525      0      0  3     0x14200  cleaner       cleaner
 81897  226934      0      0  3     0x14200  reaper        reaper
 17130   96600      0      0  3     0x14200  pgdaemon      pagedaemon
 80528  104662      0      0  3     0x14200  bored         wsdisplay0
 66418  121964      0      0  3     0x14200  usbtsk        usbtask
 81259  113739      0      0  3     0x14200  usbatsk       usbatsk
 55149  274075      0      0  3     0x14200  bored         sensors
 52509  425472      0      0  3  0x40014200  acpi0         acpi0
 99716  322609      0      0  7  0x40014200                idle7
 13816  255518      0      0  7  0x40014200                idle6
 19438  516832      0      0  7  0x40014200                idle5
 12990  165070      0      0  7  0x40014200                idle4
 94588   16061      0      0  3  0x40014200                idle3
 24348   51863      0      0  3  0x40014200                idle2
 15779  458677      0      0  7  0x40014200                idle1
 86291  273017      0      0  3     0x14200  bored         softnet3
 42226  170805      0      0  3     0x14200  bored         softnet2
 59793  424398      0      0  3     0x14200  bored         softnet1
 15121  321698      0      0  3     0x14200  bored         softnet0
 96660   14196      0      0  3     0x14200  bored         systqmp
  1086  472575      0      0  3     0x14200  bored         systq
 98808  168088      0      0  3     0x14200  tmoslp        softclockmp
 37114   12489      0      0  3  0x40014200  tmoslp        softclock
 34875  214046      0      0  7  0x40014200                idle0
 58685  477216      0      0  3     0x14200  kmalloc       kmthread
     1   54597      0      0  3   0x8000082  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper

ddb{3}> show uvm
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  736533 VM pages: 17336 active, 89088 inactive, 1 wired, 455624 free (56963 zero)
  min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
  freemin=24551, free-target=32734, inactive-target=0, wired-max=245511
  faults=324856630, traps=352534994, intrs=3454558, ctxswitch=170378607 fpuswitch=2308330
  softint=30748978, syscalls=760351157, kmapent=535
  fault counts:
    noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
    ok relocks(total)=361573(373193), anget(retries)=146838497(0), amapcopy=73104379
    neighbor anon/obj pg=34140648/115322218, gets(lock/unlock)=32462604/374403
    cases: anon=133497967, anoncow=13340530, obj=28897645, prcopy=3552128, przero=145087858
  daemon and swap counts:
    woke=0, revs=0, scans=0, obscans=0, anscans=0
    busy=0, freed=0, reactivate=0, deactivate=0
    pageouts=0, pending=0, nswget=0
    nswapdev=1
    swpages=824403, swpginuse=0, swpgonly=0 paging=0
  kernel pointers:
    objs(kern)=0xd0fd431c

ttylog is a program used by syslogd regress to redirect console.
I need this to test syslogd console logging.  My suspicion is that
there is a race in kernel console logging code and all these splassert
warnings triggered the crash.

bluhm