From: Martin Pieuchot Subject: Re: btrace, identifying syscall+0x5c4 To: tech Date: Tue, 3 Dec 2024 08:27:48 +0100 On 02/12/24(Mon) 23:43, Stuart Henderson wrote: > On 2024/12/02 17:47, Martin Pieuchot wrote: > > On 02/12/24(Mon) 14:49, Stuart Henderson wrote: > > > On 2024/12/02 14:35, Stuart Henderson wrote: > > > > On 2024/12/02 15:12, Claudio Jeker wrote: > > > > > On Mon, Dec 02, 2024 at 01:53:02PM +0000, Stuart Henderson wrote: > > > > > > Thought I'd take a look at a kernel profile from a machine which is > > > > > > fairly "interesting" in terms of kernel spin. (Running -current). > > > > > > > > > > > > In the attached output from running "btrace kprofile.bt" I have a > > > > > > lot of time accounted for under syscall+0x5c4, which has _kernel_lock > > > > > > directly next to it in the traces, and I was wondering what that is. > > > > > > Bit confused by disasm of trap.o but my guess is that it might be > > > > > > a lock relating to dt(4) itself, is that right? > > > > > > > > > > I think this is the bit of syscall that grabs the kernel lock for any > > > > > syscall not marked NOLOCK. In your case this is mostly fstat*, open* > > > > > or the bits under syscall+0x5e7. > > > > > > > > Thank you. > > > > > > > > > Your system does a lot of parallel disk IO and that is all still using the > > > > > KERNEL_LOCK. > > > > > > > > oh, I should try "VFS syscalls & KERNEL_LOCK()" on this machine! > > > > > > ...and fwiw that looks like this: > > > > Interesting. What kind of workload is that? You have the same amount > > of contention spread across open(2), fstat(2), read(2) and close(2). > > It's a monitoring box, the main thing causing load is librenms (php > web + cli programs). The main load generator is the every-5-mins > polling, this is a php cli program run from cron which calls > snmpget and snmpbulkwalk to gather stats, using them to update > mariadb and rrdtool databases (mariadb/rrdtool are running on the > same machine). > > It also runs smokeping (fping and again rrdtool updates but on a much > smaller scale) and icinga2 (various small check programs, mariadb > updates, redis updates) but those are lighter weight (and run by their > own schedulers with time delays spreading around more, rather than > being directly wall-clock synced as with librenms). Thanks. > hw.ncpu=24 hw.ncpuonline=12 (Xeon E5-2630 v2). I wonder if you don't have a piece of software that create a number of threads based on ncpu. > > If fstat(2) is done on a pipe, socket or kqueue the updated diff below > > might help. It pushes the KERNEL_LOCK() down to vn_statfile() in this > > syscall. > > Attached profile was created with this diff (machine had been up for > a few hours before I did that). > > I then wondered what was running fstat(2) and tried ktrace on one of > the php librenms poller processes to see if it was that - quickly > resulting in a crash with > > *cpu1: kernel diagnostic assertion "_kernel_lock_held()" failed: file > "/sys/kern/kern_ktrace.c", line 660 Could you send me the picture of the trace once cpu1? The one below is missing the interesting bits where the panic comes frome. > I haven't tried to trigger again but I suspect it will not be too > difficult - probably best if I don't trash the filesystems on this > box too badly if possible ;) If you can't reproduce it let me know > and I can try to provoke it on a WITNESS kernel. > > (I didn't try ktrace with the earlier diff before fstat unlock). It's certainly a missing KERNEL_LOCK() dance around some ktrace in my diff. > I have screenshots with the rest of the details from traces if > wanted, didn't upload them yet. I didn't save other bits from ddb > this time (no 'show all procs', etc). > > I'll leave it running overnight with that same kernel including the > fstat unlock and see if I run into any other issues (without > poking it with ktrace). Thanks. > WaAnRiNcI:N Gk:e rSnPeLl NdOiTa gLnOoWsEtRiEcD aOsNs eSrYtSiCoAnL L" > _5k e-d > 3n3e1l8_2l9o7c6k0_ hEeXlIdT( )0" 9f > iStopped at savectx a > +0xae: movl $0,%gs:0x680 > > procs on cpu: (PRFLAGS all 0x2, PFLAGS 0 except for rrdtool 0x1) > 5 snmpget > 4 snmpget > 2 snmpbulkwalk > 6 snmpbulkwalk > * 3 snmpget > 7K snmpbulkwalk > 10 snmpget > 8 snmpget > 11 snmpget > 0 snmpbulkwalk > 9 php-8.3 > 1 rrdtool > > ddb{3}> show panic > *cpu1: kernel diagnostic asserton "_kernel_lock_held()" failed: file > "/sys/kern/kern_ktrace.c", line 660 > ddb{3}> tr > savectx() at savectx+0xae > end of kernel > ... > ddb{0} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > end of kernel > > ddb{1} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > x86_bus_space_io_write_1+0x1d > pcdisplay_cursor+0xa6 > wsemul_vt100_output+0xc7 > wsdisplay_cnputc+0x65 > cnputc+0x3b > db_putchar+0x485 > kprintf+0x1351 > db_printf+0x6d > panic+0xc2 > __asert+0x29 > ktrwriteraw+0x28f > > ddb{2} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > alltraps+0x135 > end of kernel > > ddb{3} -> above > > ddb{4} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > _kernel_lock+0xb2 > doopenat+0x1ef > syscall+0x620 > Xsyscall+0x128 > end of kernel > > ddb{5} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > _kernel_lock+0xa0 > sys_fstat+0x98 > syscall+0x620 > Xsyscall+0x128 > end of kernel > > ddb{6} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > pagezero+0x21 > uvm_pagealloc+0xe2 > uvm_fault_lower+0x159 > uvm_fault+0x1f7 > upageflttrap+0x6c > usertrap+0x217 > recall_trap+0x8 > > ddb{7} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > uvm_pmr_cache_put+0x206 > uvm_km_pgremove_intrsafe+0x48 > uvm_unmap_kill_entry_withlock+0xf2 > uvm_unmap_remove+0x32f > uvm_unmap+0x7d > free+0x172 > ufs_readdir+0x2c7 > VOP_READDIR+0x46 > sys_getdents+0x17a > syscall+0x5e7 > Xsyscall+0x128 > end of kernel > > ddb{8} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > _kernel_lock+0xa9 > doopenat+0x1ef > syscall+0x620 > Xsyscall+0x128 > end of kernel > > ddb{9} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > uvm_fault+0x4c > upageflttrap+0x6c > usertrap+0x217 > recall_trap+0x8 > end of kernel > > ddb{10} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > krernel_lock+0xa6 > vn_read+0x44 > dofilereadv+0x163 > sys_read+0x55 > syscall+0x620 > Xsyscall+0x128 > end of kernel > > ddb{11} tr > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > end of kernel > > ddb{12} tr, same for 12-23 (smp-disabled) > x86_ipi_db+0x16 > x86_ipi_handler+0x80 > Xresume_lapic_ipi+0x27 > acpicpu_idle+0x239 > end of kernel >