Download raw body.
btrace, identifying syscall+0x5c4
On 02/12/24(Mon) 23:43, Stuart Henderson wrote:
> On 2024/12/02 17:47, Martin Pieuchot wrote:
> > On 02/12/24(Mon) 14:49, Stuart Henderson wrote:
> > > On 2024/12/02 14:35, Stuart Henderson wrote:
> > > > On 2024/12/02 15:12, Claudio Jeker wrote:
> > > > > On Mon, Dec 02, 2024 at 01:53:02PM +0000, Stuart Henderson wrote:
> > > > > > Thought I'd take a look at a kernel profile from a machine which is
> > > > > > fairly "interesting" in terms of kernel spin. (Running -current).
> > > > > >
> > > > > > In the attached output from running "btrace kprofile.bt" I have a
> > > > > > lot of time accounted for under syscall+0x5c4, which has _kernel_lock
> > > > > > directly next to it in the traces, and I was wondering what that is.
> > > > > > Bit confused by disasm of trap.o but my guess is that it might be
> > > > > > a lock relating to dt(4) itself, is that right?
> > > > >
> > > > > I think this is the bit of syscall that grabs the kernel lock for any
> > > > > syscall not marked NOLOCK. In your case this is mostly fstat*, open*
> > > > > or the bits under syscall+0x5e7.
> > > >
> > > > Thank you.
> > > >
> > > > > Your system does a lot of parallel disk IO and that is all still using the
> > > > > KERNEL_LOCK.
> > > >
> > > > oh, I should try "VFS syscalls & KERNEL_LOCK()" on this machine!
> > >
> > > ...and fwiw that looks like this:
> >
> > Interesting. What kind of workload is that? You have the same amount
> > of contention spread across open(2), fstat(2), read(2) and close(2).
>
> It's a monitoring box, the main thing causing load is librenms (php
> web + cli programs). The main load generator is the every-5-mins
> polling, this is a php cli program run from cron which calls
> snmpget and snmpbulkwalk to gather stats, using them to update
> mariadb and rrdtool databases (mariadb/rrdtool are running on the
> same machine).
>
> It also runs smokeping (fping and again rrdtool updates but on a much
> smaller scale) and icinga2 (various small check programs, mariadb
> updates, redis updates) but those are lighter weight (and run by their
> own schedulers with time delays spreading around more, rather than
> being directly wall-clock synced as with librenms).
Thanks.
> hw.ncpu=24 hw.ncpuonline=12 (Xeon E5-2630 v2).
I wonder if you don't have a piece of software that create a number of
threads based on ncpu.
> > If fstat(2) is done on a pipe, socket or kqueue the updated diff below
> > might help. It pushes the KERNEL_LOCK() down to vn_statfile() in this
> > syscall.
>
> Attached profile was created with this diff (machine had been up for
> a few hours before I did that).
>
> I then wondered what was running fstat(2) and tried ktrace on one of
> the php librenms poller processes to see if it was that - quickly
> resulting in a crash with
>
> *cpu1: kernel diagnostic assertion "_kernel_lock_held()" failed: file
> "/sys/kern/kern_ktrace.c", line 660
Could you send me the picture of the trace once cpu1? The one below is
missing the interesting bits where the panic comes frome.
> I haven't tried to trigger again but I suspect it will not be too
> difficult - probably best if I don't trash the filesystems on this
> box too badly if possible ;) If you can't reproduce it let me know
> and I can try to provoke it on a WITNESS kernel.
>
> (I didn't try ktrace with the earlier diff before fstat unlock).
It's certainly a missing KERNEL_LOCK() dance around some ktrace in my
diff.
> I have screenshots with the rest of the details from traces if
> wanted, didn't upload them yet. I didn't save other bits from ddb
> this time (no 'show all procs', etc).
>
> I'll leave it running overnight with that same kernel including the
> fstat unlock and see if I run into any other issues (without
> poking it with ktrace).
Thanks.
> WaAnRiNcI:N Gk:e rSnPeLl NdOiTa gLnOoWsEtRiEcD aOsNs eSrYtSiCoAnL L"
> _5k e-d
> 3n3e1l8_2l9o7c6k0_ hEeXlIdT( )0" 9f
> iStopped at savectx a
> +0xae: movl $0,%gs:0x680
>
> procs on cpu: (PRFLAGS all 0x2, PFLAGS 0 except for rrdtool 0x1)
> 5 snmpget
> 4 snmpget
> 2 snmpbulkwalk
> 6 snmpbulkwalk
> * 3 snmpget
> 7K snmpbulkwalk
> 10 snmpget
> 8 snmpget
> 11 snmpget
> 0 snmpbulkwalk
> 9 php-8.3
> 1 rrdtool
>
> ddb{3}> show panic
> *cpu1: kernel diagnostic asserton "_kernel_lock_held()" failed: file
> "/sys/kern/kern_ktrace.c", line 660
> ddb{3}> tr
> savectx() at savectx+0xae
> end of kernel
> ...
> ddb{0} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> end of kernel
>
> ddb{1} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> x86_bus_space_io_write_1+0x1d
> pcdisplay_cursor+0xa6
> wsemul_vt100_output+0xc7
> wsdisplay_cnputc+0x65
> cnputc+0x3b
> db_putchar+0x485
> kprintf+0x1351
> db_printf+0x6d
> panic+0xc2
> __asert+0x29
> ktrwriteraw+0x28f
>
> ddb{2} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> alltraps+0x135
> end of kernel
>
> ddb{3} -> above
>
> ddb{4} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> _kernel_lock+0xb2
> doopenat+0x1ef
> syscall+0x620
> Xsyscall+0x128
> end of kernel
>
> ddb{5} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> _kernel_lock+0xa0
> sys_fstat+0x98
> syscall+0x620
> Xsyscall+0x128
> end of kernel
>
> ddb{6} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> pagezero+0x21
> uvm_pagealloc+0xe2
> uvm_fault_lower+0x159
> uvm_fault+0x1f7
> upageflttrap+0x6c
> usertrap+0x217
> recall_trap+0x8
>
> ddb{7} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> uvm_pmr_cache_put+0x206
> uvm_km_pgremove_intrsafe+0x48
> uvm_unmap_kill_entry_withlock+0xf2
> uvm_unmap_remove+0x32f
> uvm_unmap+0x7d
> free+0x172
> ufs_readdir+0x2c7
> VOP_READDIR+0x46
> sys_getdents+0x17a
> syscall+0x5e7
> Xsyscall+0x128
> end of kernel
>
> ddb{8} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> _kernel_lock+0xa9
> doopenat+0x1ef
> syscall+0x620
> Xsyscall+0x128
> end of kernel
>
> ddb{9} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> uvm_fault+0x4c
> upageflttrap+0x6c
> usertrap+0x217
> recall_trap+0x8
> end of kernel
>
> ddb{10} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> krernel_lock+0xa6
> vn_read+0x44
> dofilereadv+0x163
> sys_read+0x55
> syscall+0x620
> Xsyscall+0x128
> end of kernel
>
> ddb{11} tr
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> end of kernel
>
> ddb{12} tr, same for 12-23 (smp-disabled)
> x86_ipi_db+0x16
> x86_ipi_handler+0x80
> Xresume_lapic_ipi+0x27
> acpicpu_idle+0x239
> end of kernel
>
btrace, identifying syscall+0x5c4