Download raw body.
Please test: parallel fault handling
On Thu, May 22, 2025 at 08:19:38PM +0200, Mark Kettenis wrote:
> > Date: Thu, 22 May 2025 18:54:08 +0200
> > From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
[...]
> > *Bzzzt*
> >
> > The same LDOM was busy compiling two devel/llvm copies under dpb(1).
> > Input welcome, I'm not sure yet what other ddb commands could help.
> >
> > login: panic: trap type 0x34 (mem address not aligned): pc=1012f68 npc=1012f6c pstate=820006<PRIV,IE>
> > Stopped at db_enter+0x8: nop
> > TID PID UID PRFLAGS PFLAGS CPU COMMAND
> > 57488 1522 0 0x11 0 1 perl
> > 435923 9891 55 0x1000002 0 4 cc1plus
> > 135860 36368 55 0x1000002 0 13 cc1plus
> > 333743 96489 55 0x1000002 0 0 cc1plus
> > 433162 55422 55 0x1000002 0 9 cc1plus
> > 171658 49723 55 0x1000002 0 5 cc1plus
> > 47127 57536 55 0x1000002 0 10 cc1plus
> > 56600 9350 55 0x1000002 0 14 cc1plus
> > 159792 13842 55 0x1000002 0 6 cc1plus
> > 510019 10312 55 0x1000002 0 8 cc1plus
> > 20489 65709 55 0x1000002 0 15 cc1plus
> > 337455 42430 55 0x1000002 0 12 cc1plus
> > 401407 80906 55 0x1000002 0 11 cc1plus
> > 22993 62317 55 0x1000002 0 2 cc1plus
> > 114916 17058 55 0x1000002 0 7 cc1plus
> > *435412 33034 0 0x14000 0x200 3K pagedaemon
> > trap(400fe6b19b0, 34, 1012f68, 820006, 3, 42) at trap+0x334
> > Lslowtrap_reenter(40015a58a00, 77b5db2000, deadbeefdeadc0c7, 1d8, 2df0fc468, 468) at Lslowtrap_reenter+0xf8
> > pmap_page_protect(40010716ab8, c16, 1cc9860, 193dfa0, 1cc9000, 1cc9000) at pmap_page_protect+0x1fc
> > uvm_pagedeactivate(40010716a50, 40015a50d24, 18667a0, 0, 0, 1c8dac0) at uvm_pagedeactivate+0x54
> > uvmpd_scan_active(0, 0, 270f2, 18667a0, 0, ffffffffffffffff) at uvmpd_scan_active+0x150
> > uvm_pageout(400fe6b1e08, 55555556, 18667a0, 1c83f08, 1c83000, 1c8dc18) at uvm_pageout+0x2dc
> > proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x10
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports. Insufficient info makes it difficult to find and fix bugs.
>
> If there are pmap issues, pmap_page_protect() is certainly the first
> place I'd look. I'll start looking, but don't expect to have much
> time until after monday.
Indeed this crash lies in pmap_page_protect(). llvm-objdump -dlS says
it's stopped at l.2499:
} else {
pv_entry_t firstpv;
/* remove mappings */
firstpv = pa_to_pvh(pa);
mtx_enter(&pg->mdpage.pvmtx);
/* First remove the entire list of continuation pv's*/
while ((pv = firstpv->pv_next) != NULL) {
--> data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
/* Save REF/MOD info */
firstpv->pv_va |= pmap_tte2flags(data);
; /sys/arch/sparc64/sparc64/pmap.c:2499
; data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
3c10: a7 29 30 0d sllx %g4, 13, %l3
3c14: d2 5c 60 10 ldx [%l1+16], %o1
3c18: d0 5c 60 08 ldx [%l1+8], %o0
--> 3c1c: 40 00 00 00 call 0
3c20: 92 0a 40 13 and %o1, %l3, %o1
As discussed with miod I suspect the crash actually lies inside
pseg_get(), but I can't prove it.
--
jca
Please test: parallel fault handling