Index | Thread | Search

From:
Alexander Bluhm <alexander.bluhm@gmx.net>
Subject:
Re: Unsplice socket before start the teardown
To:
Vitaliy Makkoveev <mvs@openbsd.org>
Cc:
tech@openbsd.org
Date:
Fri, 16 Feb 2024 00:30:26 +0100

Download raw body.

Thread
On Thu, Feb 15, 2024 at 09:12:52PM +0300, Vitaliy Makkoveev wrote:
> 
> 
> > On 15 Feb 2024, at 20:08, Alexander Bluhm <alexander.bluhm@gmx.net> wrote:
> > 
> > On Thu, Feb 15, 2024 at 07:29:53PM +0300, Vitaliy Makkoveev wrote:
> >> On Thu, Feb 15, 2024 at 01:20:15PM +0100, Alexander Bluhm wrote:
> >>> On Thu, Feb 15, 2024 at 12:28:28PM +0300, Vitaliy Makkoveev wrote:
> >>>> Use barriers instead of `ssp_idleto' task and timeout(9)
> >>>> re-initialization to wait concurrent somove() and soidle().
> >>> 
> >>> Why do you move the unsplice code before the solinger sleep?
> >>> 
> >>> Are you sure that you don't change the semantics?  Is all data still
> >>> spliced before the connection closes?
> >>> 
> >>> bluhm
> >>> 
> >> 
> >> The socket is inaccessible from userland, but SO_LINGER keeps it alive,
> >> mostly to perform transmission of pending data. The new transmissions
> >> and the reception are impossible. However, this is not true for spliced
> >> sockets because they are still accessible by the spliced peer, but not
> >> forever.
> >> 
> >> IIRC, only tcp_ctlinput() will wakeup us, so we leave this loop after
> >> TCP_LINGERTIME or if the remote peer was disconnected.
> >> 
> >> So, for the spliced case, you can receive data and awake somove(), but
> >> reach `so_linger' timeout and follow the destruction path. Or you could
> >> pru_send() data to dying `sosp' which already reached `so_linger'
> >> timeout. You don't know is your spliced peer closed, so I see no
> >> difference, will it be unspliced now or after TCP_LINGERTIME seconds.
> >> 
> >> Since the pru_detach() is not called, the remote also doesn't know that
> >> socket is closed.
> >> 
> >> I wanted to move it from sofree(). Since I see no differences, the idea
> >> to do pru_detach() after disconnect and unsplice looked good.
> >> 
> >> 
> >> This diff keeps current notation, so the unsplice moved before sofree().
> > 
> > As there are a lot of corner cases that I have fixed in the years
> > before, I fear that movin code around may change the semantics.  So
> > I like this versiom much more, I have thrown it on regress machine.
> > 
> > Before sofree and unsplice was called from in_pcbdetach().  Are you
> > sure that this is no longer necessary?  TCP can get a reset and
> > call tcp_close().  It has to unsplice then.
> 
> in_pcbdetach() detaches and destroys PCB, but keeps socket alive. It
> sets so_pcb to NULL, but keeps SS_NOFDREF clean, so you return just
> after enter:
> 
> sofree(struct socket *so, int keep_lock)
> {
>         int persocket = solock_persocket(so);
> 
>         soassertlocked(so);
> 
>         if (so->so_pcb || (so->so_state & SS_NOFDREF) == 0) {
>                 if (!keep_lock)
>                         sounlock(so);
>                 return;
>         }
> 
> Otherwise, this this will be use-after-free issue, because the file
> descriptor is still accessible from userland.

Crash during regress

START	sys/kern/sosplice/loop	2024-02-15T20:42:50Z

rm -f a.out [Ee]rrs mklog *.core y.tab.h       *.log ktrace.out stamp-*  

==== run-chain-ipv4-prototcp-count1 ====
ulimit -n 500;  SUDO=""  perl /usr/src/regress/sys/kern/sosplice/loop/chain.pl -4 -c 1 -p tcp 

==== run-chain-ipv4-prototcp-count2 ====
ulimit -n 500;  SUDO=""  perl /usr/src/regress/sys/kern/sosplice/loop/chain.pl -4 -c 2 -p tcp 
Timeout, server ot1 not responding.

[-- MARK -- Thu Feb 15 21:40:00 2024]
panic: uvm_fault(0xd7b63aa8, 0x0, 0, 1) -> e
Stopped at      db_enter+0x4:   popl    %ebp
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
*294092  68423      0  0x18000002          0    3  perl
db_enter() at db_enter+0x4
panic(d0c61049) at panic+0x7a
kpageflttrap(f5f0f66c,0) at kpageflttrap+0x137
trap(f5f0f66c) at trap+0x25f
calltrap() at calltrap+0xc
solock_persocket(d76cb914) at solock_persocket+0xb
sofree(d76cb914,0) at sofree+0x15
soclose(d76cb914,0) at soclose+0x298
soo_close(d7820ee0,d76edd1c) at soo_close+0x1e
fdrop(d7820ee0,d76edd1c) at fdrop+0x6a
closef(d7820ee0,d76edd1c) at closef+0x9d
fdrelease(d76edd1c,6) at fdrelease+0x7a
sys_close(d76edd1c,f5f0f7f0,f5f0f7e8) at sys_close+0x54
syscall(f5f0f830) at syscall+0x55e
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{3}> show panic
*cpu3: uvm_fault(0xd7b63aa8, 0x0, 0, 1) -> e

ddb{3}> trace
db_enter() at db_enter+0x4
panic(d0c61049) at panic+0x7a
kpageflttrap(f5f0f66c,0) at kpageflttrap+0x137
trap(f5f0f66c) at trap+0x25f
calltrap() at calltrap+0xc
solock_persocket(d76cb914) at solock_persocket+0xb
sofree(d76cb914,0) at sofree+0x15
soclose(d76cb914,0) at soclose+0x298
soo_close(d7820ee0,d76edd1c) at soo_close+0x1e
fdrop(d7820ee0,d76edd1c) at fdrop+0x6a
closef(d7820ee0,d76edd1c) at closef+0x9d
fdrelease(d76edd1c,6) at fdrelease+0x7a
sys_close(d76edd1c,f5f0f7f0,f5f0f7e8) at sys_close+0x54
syscall(f5f0f830) at syscall+0x55e
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel

ddb{3}> show register
ds                  0x10
es                  0x10
fs                  0x20
gs                     0
edi           0xd0c61049        acx100_txpower_maxim+0x17d2d
esi                    0
ebp           0xf5f0f5e4
ebx           0xf55435d8
edx                0x3fd
ecx           0x2543f619
eax                 0x2d
eip           0xd0acde54        db_enter+0x4
cs                   0x8
eflags             0x202
esp           0xf5f0f5e4
ss                  0x10
db_enter+0x4:   popl    %ebp

ddb{3}> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
*68423  294092   1566      0  7  0x18000002                perl
  1566  381024  50079      0  3   0x810008a  sigsusp       sh
 50079  221897  51550      0  3  0x1810008a  sigsusp       make
 51550  141637  42441      0  3   0x810008a  sigsusp       sh
 11165  227136      0      0  3     0x14200  bored         sosplice
 42441  172021  38678      0  3  0x1810008a  sigsusp       make
 98739  505918  71284      0  3  0x18100082  piperd        gzip
 71284   40345  38678      0  3   0x8100082  piperd        pax
 38678  326846  51296      0  3  0x18000082  piperd        perl
 51296  212340  99795      0  3   0x810008a  sigsusp       ksh
 99795  178440  68741      0  3  0x1800009a  kqread        sshd
 52700  433251      1      0  3  0x18100083  ttyin         getty
 45855  260933      1      0  3  0x18100083  ttyin         getty
 39444  214411      1      0  3  0x18100083  ttyin         getty
 51123  462323      1      0  3  0x18100083  ttyin         getty
 92546  126208      1      0  3  0x18100083  ttyin         getty
 10113  456162      1      0  3  0x18100083  ttyin         getty
 31520  477910      1      0  3  0x18100098  kqread        cron
 42614  426049      1     99  3  0x19100090  kqread        sndiod
 80379  237429      1    110  3  0x18100090  kqread        sndiod
 95484  403418  93616     95  3  0x19100092  kqread        smtpd
 85783   51147  93616    103  3  0x19100092  kqread        smtpd
 95595  499204  93616     95  3  0x19100092  kqread        smtpd
 26227   51221  93616     95  3  0x18100092  kqread        smtpd
 97457  238234  93616     95  3  0x19100092  kqread        smtpd
 75053  519968  93616     95  3  0x19100092  kqread        smtpd
 93616  261812      1      0  3  0x18100080  kqread        smtpd
 17706  445612  30932     91  3  0x18000092  kqread        snmpd_metrics
 30932  420221      1      0  3  0x18100080  kqread        snmpd
 40427  410777      1     91  3  0x19100092  kqread        snmpd
 68741  355001      1      0  3  0x18000088  kqread        sshd
  2780  263124      0      0  3     0x14200  acct          acct
 13887  496736      0      0  3     0x14280  nfsidl        nfsio
 10297   95168      0      0  3     0x14280  nfsidl        nfsio
 88712  206326      0      0  3     0x14280  nfsidl        nfsio
 22183   22208      0      0  3     0x14280  nfsidl        nfsio
 84524  154609      1      0  3  0x18100080  kqread        ntpd
 40420  405330  51338     83  3  0x18100092  kqread        ntpd
 51338  432705      1     83  3  0x19100092  kqread        ntpd
  6655  208118  28970     74  3  0x19100092  bpf           pflogd
 28970   27438      1      0  3  0x18000080  netio         pflogd
 91899  307352  42956     73  3  0x19100090  kqread        syslogd
 42956  136252      1      0  3  0x18100082  netio         syslogd
 67690   26924  99660     77  3  0x18100092  kqread        dhcpleased
 71323  300988  99660     77  3  0x18100092  kqread        dhcpleased
 99660  442179      1      0  3  0x18000080  kqread        dhcpleased
 89994  486637  91226    115  3  0x18100092  kqread        slaacd
 89258  188245  91226    115  3  0x18100092  kqread        slaacd
 91226  179241      1      0  3  0x18100080  kqread        slaacd
 80996  220821      0      0  3     0x14200  bored         smr
 94808  343068      0      0  3     0x14200  pgzero        zerothread
 10643  490789      0      0  3     0x14200  aiodoned      aiodoned
 38625  478999      0      0  3     0x14200  syncer        update
 96730   41784      0      0  3     0x14200  cleaner       cleaner
 75318  433043      0      0  3     0x14200  reaper        reaper
 15645  104664      0      0  3     0x14200  pgdaemon      pagedaemon
 42378  487419      0      0  3     0x14200  bored         wsdisplay0
  3199  422720      0      0  3     0x14200  usbtsk        usbtask
 91003  277263      0      0  3     0x14200  usbatsk       usbatsk
 98115  126124      0      0  3     0x14200  bored         sensors
 28530  399295      0      0  3  0x40014200  acpi0         acpi0
 35845  415026      0      0  7  0x40014200                idle7
 60184   31409      0      0  7  0x40014200                idle6
 65844  419059      0      0  7  0x40014200                idle5
  8954   38915      0      0  7  0x40014200                idle4
 74670  191490      0      0  3  0x40014200                idle3
 64306  222040      0      0  7  0x40014200                idle2
  1038   44274      0      0  7  0x40014200                idle1
 56328   94044      0      0  3     0x14200  bored         softnet3
 24336  335491      0      0  3     0x14200  bored         softnet2
 34788  411987      0      0  3     0x14200  bored         softnet1
 29367  155662      0      0  3     0x14200  bored         softnet0
 37233  511810      0      0  3     0x14200  bored         systqmp
 76004   97209      0      0  3     0x14200  bored         systq
 62673  110815      0      0  3     0x14200  tmoslp        softclockmp
 78786  332800      0      0  3  0x40014200  tmoslp        softclock
 22207  278488      0      0  7  0x40014200                idle0
 41103  513492      0      0  3     0x14200  kmalloc       kmthread
     1  520550      0      0  3   0x8000082  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper

ddb{3}> show locks
exclusive rwlock netlock r = 0 (0xd0ec12f8)

ddb{3}> show uvm
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  736543 VM pages: 14118 active, 95046 inactive, 1 wired, 496466 free (62063 zero)
  min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
  freemin=24551, free-target=32734, inactive-target=0, wired-max=245514
  faults=242439947, traps=247588368, intrs=2185812, ctxswitch=68836215 fpuswitch=1881578
  softint=7236222, syscalls=563057153, kmapent=84
  fault counts:
    noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
    ok relocks(total)=262758(273410), anget(retries)=97537690(0), amapcopy=47681050
    neighbor anon/obj pg=22588133/68129238, gets(lock/unlock)=19423016/274757
    cases: anon=90493043, anoncow=7044647, obj=17267032, prcopy=2143985, przero=125011129
  daemon and swap counts:
    woke=0, revs=0, scans=0, obscans=0, anscans=0
    busy=0, freed=0, reactivate=0, deactivate=0
    pageouts=0, pending=0, nswget=0
    nswapdev=1
    swpages=824403, swpginuse=0, swpgonly=0 paging=0
  kernel pointers:
    objs(kern)=0xd0fd32fc

ddb{3}> mach ddbcpu 0
Stopped at      db_enter+0x4:   popl    %ebp
db_enter() at db_enter+0x4
i386_ipi_handler() at i386_ipi_handler+0x3f
Xipi_untramp() at Xipi_untramp+0xc2
acpicpu_idle() at acpicpu_idle+0xed
cpu_idle_cycle() at cpu_idle_cycle+0xc

ddb{3}> mach ddbcpu 1
Stopped at      db_enter+0x4:   popl    %ebp
db_enter() at db_enter+0x4
i386_ipi_handler() at i386_ipi_handler+0x3f
Xipi_untramp() at Xipi_untramp+0xc2
acpicpu_idle() at acpicpu_idle+0xed
cpu_idle_cycle() at cpu_idle_cycle+0xc

ddb{1}> mach ddbcpu 2
Stopped at      db_enter+0x4:   popl    %ebp
db_enter() at db_enter+0x4
i386_ipi_handler() at i386_ipi_handler+0x3f
Xipi_untramp() at Xipi_untramp+0xc2
acpicpu_idle() at acpicpu_idle+0xed
cpu_idle_cycle() at cpu_idle_cycle+0xc