Index | Thread | Search

From:
Martin Pieuchot <mpi@grenadille.net>
Subject:
Re: btrace, identifying syscall+0x5c4
To:
tech <tech@openbsd.org>
Date:
Tue, 3 Dec 2024 08:27:48 +0100

Download raw body.

Thread
On 02/12/24(Mon) 23:43, Stuart Henderson wrote:
> On 2024/12/02 17:47, Martin Pieuchot wrote:
> > On 02/12/24(Mon) 14:49, Stuart Henderson wrote:
> > > On 2024/12/02 14:35, Stuart Henderson wrote:
> > > > On 2024/12/02 15:12, Claudio Jeker wrote:
> > > > > On Mon, Dec 02, 2024 at 01:53:02PM +0000, Stuart Henderson wrote:
> > > > > > Thought I'd take a look at a kernel profile from a machine which is
> > > > > > fairly "interesting" in terms of kernel spin. (Running -current).
> > > > > > 
> > > > > > In the attached output from running "btrace kprofile.bt" I have a
> > > > > > lot of time accounted for under syscall+0x5c4, which has _kernel_lock
> > > > > > directly next to it in the traces, and I was wondering what that is.
> > > > > > Bit confused by disasm of trap.o but my guess is that it might be
> > > > > > a lock relating to dt(4) itself, is that right?
> > > > >  
> > > > > I think this is the bit of syscall that grabs the kernel lock for any
> > > > > syscall not marked NOLOCK. In your case this is mostly fstat*, open*
> > > > > or the bits under syscall+0x5e7.
> > > > 
> > > > Thank you.
> > > > 
> > > > > Your system does a lot of parallel disk IO and that is all still using the
> > > > > KERNEL_LOCK.
> > > > 
> > > > oh, I should try "VFS syscalls & KERNEL_LOCK()" on this machine!
> > > 
> > > ...and fwiw that looks like this:
> > 
> > Interesting.  What kind of workload is that?  You have the same amount
> > of contention spread across open(2), fstat(2), read(2) and close(2).
> 
> It's a monitoring box, the main thing causing load is librenms (php
> web + cli programs). The main load generator is the every-5-mins
> polling, this is a php cli program run from cron which calls
> snmpget and snmpbulkwalk to gather stats, using them to update
> mariadb and rrdtool databases (mariadb/rrdtool are running on the
> same machine).
> 
> It also runs smokeping (fping and again rrdtool updates but on a much
> smaller scale) and icinga2 (various small check programs, mariadb
> updates, redis updates) but those are lighter weight (and run by their
> own schedulers with time delays spreading around more, rather than
> being directly wall-clock synced as with librenms).

Thanks.

> hw.ncpu=24 hw.ncpuonline=12 (Xeon E5-2630 v2).

I wonder if you don't have a piece of software that create a number of
threads based on ncpu.

> > If fstat(2) is done on a pipe, socket or kqueue the updated diff below
> > might help.  It pushes the KERNEL_LOCK() down to vn_statfile() in this
> > syscall.
> 
> Attached profile was created with this diff (machine had been up for
> a few hours before I did that).
> 
> I then wondered what was running fstat(2) and tried ktrace on one of
> the php librenms poller processes to see if it was that - quickly
> resulting in a crash with
> 
> *cpu1: kernel diagnostic assertion "_kernel_lock_held()" failed: file
> "/sys/kern/kern_ktrace.c", line 660

Could you send me the picture of the trace once cpu1?  The one below is
missing the interesting bits where the panic comes frome.

> I haven't tried to trigger again but I suspect it will not be too
> difficult - probably best if I don't trash the filesystems on this
> box too badly if possible ;) If you can't reproduce it let me know
> and I can try to provoke it on a WITNESS kernel.
> 
> (I didn't try ktrace with the earlier diff before fstat unlock).

It's certainly a missing KERNEL_LOCK() dance around some ktrace in my
diff.

> I have screenshots with the rest of the details from traces if
> wanted, didn't upload them yet. I didn't save other bits from ddb
> this time (no 'show all procs', etc).
> 
> I'll leave it running overnight with that same kernel including the
> fstat unlock and see if I run into any other issues (without
> poking it with ktrace).

Thanks.

> WaAnRiNcI:N Gk:e rSnPeLl  NdOiTa gLnOoWsEtRiEcD  aOsNs eSrYtSiCoAnL L"
> _5k e-d
> 3n3e1l8_2l9o7c6k0_ hEeXlIdT( )0"  9f
> iStopped at  savectx                a
> +0xae:  movl    $0,%gs:0x680
> 
> procs on cpu: (PRFLAGS all 0x2, PFLAGS 0 except for rrdtool 0x1)
>   5  snmpget
>   4  snmpget
>   2  snmpbulkwalk
>   6  snmpbulkwalk
> * 3  snmpget
>   7K snmpbulkwalk
>  10  snmpget
>   8  snmpget
>  11  snmpget
>   0  snmpbulkwalk
>   9  php-8.3
>   1  rrdtool
> 
> ddb{3}> show panic
> *cpu1: kernel diagnostic asserton "_kernel_lock_held()" failed: file
> "/sys/kern/kern_ktrace.c", line 660
> ddb{3}> tr
> savectx() at savectx+0xae
> end of kernel
> ...
> ddb{0} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> end of kernel
> 
> ddb{1} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> x86_bus_space_io_write_1+0x1d
> pcdisplay_cursor+0xa6
> wsemul_vt100_output+0xc7
> wsdisplay_cnputc+0x65
> cnputc+0x3b
> db_putchar+0x485
> kprintf+0x1351
> db_printf+0x6d
> panic+0xc2
> __asert+0x29
> ktrwriteraw+0x28f
> 
> ddb{2} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> alltraps+0x135
> end of kernel
> 
> ddb{3} -> above
> 
> ddb{4} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> _kernel_lock+0xb2
> doopenat+0x1ef
> syscall+0x620
> Xsyscall+0x128
> end of kernel
> 
> ddb{5} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> _kernel_lock+0xa0
> sys_fstat+0x98
> syscall+0x620
> Xsyscall+0x128
> end of kernel
> 
> ddb{6} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> pagezero+0x21
> uvm_pagealloc+0xe2
> uvm_fault_lower+0x159
> uvm_fault+0x1f7
> upageflttrap+0x6c
> usertrap+0x217
> recall_trap+0x8
> 
> ddb{7} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> uvm_pmr_cache_put+0x206
> uvm_km_pgremove_intrsafe+0x48
> uvm_unmap_kill_entry_withlock+0xf2
> uvm_unmap_remove+0x32f
> uvm_unmap+0x7d
> free+0x172
> ufs_readdir+0x2c7
> VOP_READDIR+0x46
> sys_getdents+0x17a
> syscall+0x5e7
> Xsyscall+0x128
> end of kernel
> 
> ddb{8} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> _kernel_lock+0xa9
> doopenat+0x1ef
> syscall+0x620
> Xsyscall+0x128
> end of kernel
> 
> ddb{9} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> uvm_fault+0x4c
> upageflttrap+0x6c
> usertrap+0x217
> recall_trap+0x8
> end of kernel
> 
> ddb{10} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> krernel_lock+0xa6
> vn_read+0x44
> dofilereadv+0x163
> sys_read+0x55
> syscall+0x620
> Xsyscall+0x128
> end of kernel
> 
> ddb{11} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> end of kernel
> 
> ddb{12} tr, same for 12-23 (smp-disabled)
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> acpicpu_idle+0x239
> end of kernel
>