Download raw body.
Please test: parallel fault handling
On Mon, May 19, 2025 at 09:23:27PM +0200, Jeremie Courreges-Anglas wrote:
> On Tue, May 13, 2025 at 02:28:08PM +0200, Martin Pieuchot wrote:
> > On 13/05/25(Tue) 13:57, Jeremie Courreges-Anglas wrote:
> [...]
> > > The sparc64 LDOM went into two panics - I somehow managed to break it
> > > out of ddb after the first panic due to bogus conserver. The data
> > > below is minimal, sorry, I didn't recover the full trace from dmesg
> > > after the 1st panic, and 'mach ddbcpu 0' locked up during the 2nd
> > > panic.
> >
> > I wonder if sparc64's pmap is in need for some love...
> >
> > >
> > > I had not run a sparc64 bulk build on this machine in the recent
> > > months, and I don't know whether both panics are related to your diff,
> > > but I'll do my best to try and reproduce them. Currently I've
> > > restarted the bulk build without the uvm change. Maybe someone with
> > > more sparc64 knowledge will bring some clue here.
> >
> > Please let me know.
>
> I can already tell that this LDOM ran the rest of the bulk with the
> parallel uvm fault diff reverted... Now restarting another build from
> scratch, with your diff on top of -current.
>
> > You can also start by "ps /o" or "show proc".
>
> Hopefully that will be enough! mach ddbcpu x didn't seem very usable.
*Bzzzt*
The same LDOM was busy compiling two devel/llvm copies under dpb(1).
Input welcome, I'm not sure yet what other ddb commands could help.
login: panic: trap type 0x34 (mem address not aligned): pc=1012f68 npc=1012f6c pstate=820006<PRIV,IE>
Stopped at db_enter+0x8: nop
TID PID UID PRFLAGS PFLAGS CPU COMMAND
57488 1522 0 0x11 0 1 perl
435923 9891 55 0x1000002 0 4 cc1plus
135860 36368 55 0x1000002 0 13 cc1plus
333743 96489 55 0x1000002 0 0 cc1plus
433162 55422 55 0x1000002 0 9 cc1plus
171658 49723 55 0x1000002 0 5 cc1plus
47127 57536 55 0x1000002 0 10 cc1plus
56600 9350 55 0x1000002 0 14 cc1plus
159792 13842 55 0x1000002 0 6 cc1plus
510019 10312 55 0x1000002 0 8 cc1plus
20489 65709 55 0x1000002 0 15 cc1plus
337455 42430 55 0x1000002 0 12 cc1plus
401407 80906 55 0x1000002 0 11 cc1plus
22993 62317 55 0x1000002 0 2 cc1plus
114916 17058 55 0x1000002 0 7 cc1plus
*435412 33034 0 0x14000 0x200 3K pagedaemon
trap(400fe6b19b0, 34, 1012f68, 820006, 3, 42) at trap+0x334
Lslowtrap_reenter(40015a58a00, 77b5db2000, deadbeefdeadc0c7, 1d8, 2df0fc468, 468) at Lslowtrap_reenter+0xf8
pmap_page_protect(40010716ab8, c16, 1cc9860, 193dfa0, 1cc9000, 1cc9000) at pmap_page_protect+0x1fc
uvm_pagedeactivate(40010716a50, 40015a50d24, 18667a0, 0, 0, 1c8dac0) at uvm_pagedeactivate+0x54
uvmpd_scan_active(0, 0, 270f2, 18667a0, 0, ffffffffffffffff) at uvmpd_scan_active+0x150
uvm_pageout(400fe6b1e08, 55555556, 18667a0, 1c83f08, 1c83000, 1c8dc18) at uvm_pageout+0x2dc
proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x10
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb{3}>
ddb{3}>
ddb{3}> ps /o
TID PID UID PRFLAGS PFLAGS CPU COMMAND
57488 1522 0 0x11 0 1 perl
435923 9891 55 0x1000002 0 4 cc1plus
135860 36368 55 0x1000002 0 13 cc1plus
333743 96489 55 0x1000002 0 0 cc1plus
433162 55422 55 0x1000002 0 9 cc1plus
171658 49723 55 0x1000002 0 5 cc1plus
47127 57536 55 0x1000002 0 10 cc1plus
56600 9350 55 0x1000002 0 14 cc1plus
159792 13842 55 0x1000002 0 6 cc1plus
510019 10312 55 0x1000002 0 8 cc1plus
20489 65709 55 0x1000002 0 15 cc1plus
337455 42430 55 0x1000002 0 12 cc1plus
401407 80906 55 0x1000002 0 11 cc1plus
22993 62317 55 0x1000002 0 2 cc1plus
114916 17058 55 0x1000002 0 7 cc1plus
*435412 33034 0 0x14000 0x200 3K pagedaemon
ddb{3}> show proc
PROC (pagedaemon) tid=435412 pid=33034 tcnt=1 stat=onproc
flags process=14000<NOZOMBIE,SYSTEM> proc=200<SYSTEM>
runpri=4, usrpri=50, slppri=4, nice=20
wchan=0x0, wmesg=, ps_single=0x0 scnt=0 ecnt=0
forw=0xffffffffffffffff, list=0x40015a50fc0,0x40015a50a90
process=0x403c1f0ac70 user=0x400fe6ae000, vmspace=0x1c8e2c0
estcpu=0, cpticks=0, pctcpu=0.0, user=0, sys=0, intr=0
ddb{3}> show uvm
Current UVM status:
pagesize=8192 (0x2000), pagemask=0x1fff, pageshift=13
2057215 VM pages: 1446170 active, 104107 inactive, 1 wired, 91303 free (91303 zero)
freemin=68573, free-target=91430, inactive-target=516759, wired-max=685738
faults=-1646139234, traps=-1346997425, intrs=709572514, ctxswitch=368458634 fpuswitch=7786655
softint=86318616, syscalls=-1689153682, kmapent=12
fault counts:
noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
relocks=6994201(335164), upgrades=1050667241(9720) anget(retries)=1162622063(0), amapcopy=302352908
neighbor anon/obj pg=403857156/660172689, gets(lock/unlock)=442541030/7329374
cases: anon=884461767, anoncow=278160296, obj=382612954, prcopy=59590377, przero=1044001823
daemon and swap counts:
woke=153, revs=153, scans=0, obscans=0, anscans=0
busy=0, freed=0, reactivate=0, deactivate=411398
pageouts=0, pending=0, nswget=0
nswapdev=1
swpages=2130619, swpginuse=0, swpgonly=0 paging=0
kernel pointers:
objs(kern)=0x1c2d7a0
--
jca
Please test: parallel fault handling