Download raw body.
sparc64 pmap (was: Re: Please test: parallel fault handling)
On Mon, May 26, 2025 at 09:23:02AM +0200, Jeremie Courreges-Anglas wrote:
> On Mon, May 26, 2025 at 12:04:59AM +0200, Mark Kettenis wrote:
> > > Date: Sun, 25 May 2025 23:20:46 +0200
> > > From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
> > >
> > > On Thu, May 22, 2025 at 08:19:38PM +0200, Mark Kettenis wrote:
> > > > > Date: Thu, 22 May 2025 18:54:08 +0200
> > > > > From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
> > > [...]
> > > > > *Bzzzt*
> > > > >
> > > > > The same LDOM was busy compiling two devel/llvm copies under dpb(1).
> > > > > Input welcome, I'm not sure yet what other ddb commands could help.
> > > > >
> > > > > login: panic: trap type 0x34 (mem address not aligned): pc=1012f68 npc=1012f6c pstate=820006<PRIV,IE>
> > > > > Stopped at db_enter+0x8: nop
> > > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND
> > > > > 57488 1522 0 0x11 0 1 perl
> > > > > 435923 9891 55 0x1000002 0 4 cc1plus
> > > > > 135860 36368 55 0x1000002 0 13 cc1plus
> > > > > 333743 96489 55 0x1000002 0 0 cc1plus
> > > > > 433162 55422 55 0x1000002 0 9 cc1plus
> > > > > 171658 49723 55 0x1000002 0 5 cc1plus
> > > > > 47127 57536 55 0x1000002 0 10 cc1plus
> > > > > 56600 9350 55 0x1000002 0 14 cc1plus
> > > > > 159792 13842 55 0x1000002 0 6 cc1plus
> > > > > 510019 10312 55 0x1000002 0 8 cc1plus
> > > > > 20489 65709 55 0x1000002 0 15 cc1plus
> > > > > 337455 42430 55 0x1000002 0 12 cc1plus
> > > > > 401407 80906 55 0x1000002 0 11 cc1plus
> > > > > 22993 62317 55 0x1000002 0 2 cc1plus
> > > > > 114916 17058 55 0x1000002 0 7 cc1plus
> > > > > *435412 33034 0 0x14000 0x200 3K pagedaemon
> > > > > trap(400fe6b19b0, 34, 1012f68, 820006, 3, 42) at trap+0x334
> > > > > Lslowtrap_reenter(40015a58a00, 77b5db2000, deadbeefdeadc0c7, 1d8, 2df0fc468, 468) at Lslowtrap_reenter+0xf8
> > > > > pmap_page_protect(40010716ab8, c16, 1cc9860, 193dfa0, 1cc9000, 1cc9000) at pmap_page_protect+0x1fc
> > > > > uvm_pagedeactivate(40010716a50, 40015a50d24, 18667a0, 0, 0, 1c8dac0) at uvm_pagedeactivate+0x54
> > > > > uvmpd_scan_active(0, 0, 270f2, 18667a0, 0, ffffffffffffffff) at uvmpd_scan_active+0x150
> > > > > uvm_pageout(400fe6b1e08, 55555556, 18667a0, 1c83f08, 1c83000, 1c8dc18) at uvm_pageout+0x2dc
> > > > > proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x10
> > > > > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > > > > reports. Insufficient info makes it difficult to find and fix bugs.
> > > >
> > > > If there are pmap issues, pmap_page_protect() is certainly the first
> > > > place I'd look. I'll start looking, but don't expect to have much
> > > > time until after monday.
> > >
> > > Indeed this crash lies in pmap_page_protect(). llvm-objdump -dlS says
> > > it's stopped at l.2499:
> > >
> > > } else {
> > > pv_entry_t firstpv;
> > > /* remove mappings */
> > >
> > > firstpv = pa_to_pvh(pa);
> > > mtx_enter(&pg->mdpage.pvmtx);
> > >
> > > /* First remove the entire list of continuation pv's*/
> > > while ((pv = firstpv->pv_next) != NULL) {
> > > --> data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
> > >
> > > /* Save REF/MOD info */
> > > firstpv->pv_va |= pmap_tte2flags(data);
> > >
> > > ; /sys/arch/sparc64/sparc64/pmap.c:2499
> > > ; data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
> > > 3c10: a7 29 30 0d sllx %g4, 13, %l3
> > > 3c14: d2 5c 60 10 ldx [%l1+16], %o1
> > > 3c18: d0 5c 60 08 ldx [%l1+8], %o0
> > > --> 3c1c: 40 00 00 00 call 0
> > > 3c20: 92 0a 40 13 and %o1, %l3, %o1
> > >
> > > As discussed with miod I suspect the crash actually lies inside
> > > pseg_get(), but I can't prove it.
> >
> > Can you try the diff below. I've nuked pmap_collect() on other
> > architectures since it is hard to make "mpsafe". The current sparc64
> > certainly isn't.
> >
> > If this fixes the crashes, I can see if it is possible to make it
> > "mpsafe" on sparc64.
>
> I see. Indeed this function stood out when I looked at the file. Can
> I really just empty it out like this, or should I also drop
> __HAVE_PMAP_COLLECT?
>
> Yesterday the machine crashed again in a nearby place in
> pmap_page_protect().
>
> login: pmap_page_protect: gotten pseg empty!
> Stopped at pmap_page_protect+0x620: nop
> ddb{4}> tr
> uvm_pagedeactivate(4000d1510a0, 19d2ce0, 193edd0, 40015a507e4, 1, 1) at uvm_pagedeactivate+0x54
> uvn_flush(401053a3310, 0, 11c8000, 14, 1, 0) at uvn_flush+0x448
> uvn_detach(401053a3310, 40101633630, 1, 0, 0, 1000000) at uvn_detach+0x158
> uvm_unmap_detach(400fe6b5c68, 0, 9a7, 40015a507e4, 40015a507e4, 18f7828) at uvm_unmap_detach+0x68
> uvm_map_teardown(400130912e0, 4000, 1939240, 4010599c000, 54d, 17b2ff8) at uvm_map_teardown+0x184
> uvmspace_free(1cd7000, 400130912e0, 4, 17b2ff8, 0, 6) at uvmspace_free+0x64
> reaper(40015a507e0, 40015a507e0, db, 100, 1c04038, 4000) at reaper+0x100
> proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x10
> ddb{4}>
>
> This time it's pmap.c l.2534:
>
> /* Then remove the primary pv */
> if (pv->pv_pmap != NULL) {
> data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
>
> /* Save REF/MOD info */
> pv->pv_va |= pmap_tte2flags(data);
> if (pseg_set(pv->pv_pmap, pv->pv_va & PV_VAMASK, 0, 0)) {
> printf("pmap_page_protect: gotten pseg empty!\n");
> --> db_enter();
> /* panic? */
> }
>
> It's not the first crash with uvmspace_free() going through
> uvm_pagedeactivate(), so I agree with you that a fix is probably
> needed for mpi's uvm_purge() diff (which I haven't tested yet on that
> box).
FWIW pmap_page_protect() itself looks like a potential problem, as you
suspected. The diff below locks the pmaps in pmap_page_protect(PROT_NONE)
which is used by uvm_pagedeactivate.
Walking the pv list and locking the pmap introduces a lock ordering
constraint, something that the arm64 and riscv64 implems handle with a
slightly convoluted approach. Surely sparc64 should have its
pseg_get/set calls protected by the pmap lock too. Now, sparc64 is
special in several regards: each mapping must be checked for mod/ref
information by accessing psegs, pv list handling is inlined, and the
first element in the pv list is embedded in struct mdpage.
The diff below drops the current pv entry from the list before acting
on it, thus avoiding concurrent access and the "start all over again"
idiom. For the "first" pv entry, we can make a copy of it so that we
can drop the pv lock. It could be argued that this static pv
header/entry is making things more complex than they need to be.
The code keeps the mod/ref accounting. But does it make sense to keep
the mod/ref information at all, if we're flushing/destroying all the
mappings to a page? If it is indeed pointless, which I'm uncertain
about, we could drop the pseg_get/mtx_enter/etc chunks. The
performance of the diff below is similar to before applying it (with
the uvm parallel fault diff applied).
In this very function there's the "lower protection" code path that
walks the pv list and access psegs. Not sure yet how to tackle it,
but maybe we don't need to handle it yet?
Thanks to miod for shedding some light on this pmap implementation
over the week-end (he hasn't seen the diff yet).
Input welcome.
Index: pmap.c
===================================================================
RCS file: /cvs/src/sys/arch/sparc64/sparc64/pmap.c,v
diff -u -p -r1.121 pmap.c
--- pmap.c 26 Jun 2024 01:40:49 -0000 1.121
+++ pmap.c 26 May 2025 08:26:39 -0000
@@ -2489,6 +2489,7 @@ pmap_page_protect(struct vm_page *pg, vm
mtx_leave(&pg->mdpage.pvmtx);
} else {
pv_entry_t firstpv;
+ struct pv_entry pve;
/* remove mappings */
firstpv = pa_to_pvh(pa);
@@ -2496,10 +2497,19 @@ pmap_page_protect(struct vm_page *pg, vm
/* First remove the entire list of continuation pv's*/
while ((pv = firstpv->pv_next) != NULL) {
- data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
+ /* Remove item from the list, avoids concurrent calls */
+ firstpv->pv_next = pv->pv_next;
+ pv->pv_next = NULL;
+
+ /* Unlock the pv lock to avoid ordering problems */
+ mtx_leave(&pg->mdpage.pvmtx);
+ mtx_enter(&pv->pv_pmap->pm_mtx);
/* Save REF/MOD info */
+ data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
+ mtx_enter(&pg->mdpage.pvmtx);
firstpv->pv_va |= pmap_tte2flags(data);
+ mtx_leave(&pg->mdpage.pvmtx);
/* Clear mapping */
if (pseg_set(pv->pv_pmap, pv->pv_va & PV_VAMASK, 0, 0)) {
@@ -2507,6 +2517,8 @@ pmap_page_protect(struct vm_page *pg, vm
db_enter();
/* panic? */
}
+ mtx_leave(&pv->pv_pmap->pm_mtx);
+
if (pv->pv_pmap->pm_ctx || pv->pv_pmap == pmap_kernel()) {
tsb_invalidate(pv->pv_pmap->pm_ctx,
(pv->pv_va & PV_VAMASK));
@@ -2515,25 +2527,38 @@ pmap_page_protect(struct vm_page *pg, vm
atomic_dec_long(&pv->pv_pmap->pm_stats.resident_count);
/* free the pv */
- firstpv->pv_next = pv->pv_next;
- mtx_leave(&pg->mdpage.pvmtx);
pool_put(&pv_pool, pv);
mtx_enter(&pg->mdpage.pvmtx);
}
- pv = firstpv;
+ /*
+ * Make a local copy of the primary pv data to guard
+ * against concurrent calls.
+ */
+ pv = NULL;
+ if (firstpv->pv_pmap != NULL) {
+ pve = *firstpv;
+ pv = &pve;
+ firstpv->pv_pmap = NULL;
+ }
+ mtx_leave(&pg->mdpage.pvmtx);
+
+ if (pv != NULL) {
+ mtx_enter(&pv->pv_pmap->pm_mtx);
- /* Then remove the primary pv */
- if (pv->pv_pmap != NULL) {
+ /* Save REF/MOD info */
data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
+ mtx_enter(&pg->mdpage.pvmtx);
+ firstpv->pv_va |= pmap_tte2flags(data);
+ mtx_leave(&pg->mdpage.pvmtx);
- /* Save REF/MOD info */
- pv->pv_va |= pmap_tte2flags(data);
+ /* Clear mapping */
if (pseg_set(pv->pv_pmap, pv->pv_va & PV_VAMASK, 0, 0)) {
printf("pmap_page_protect: gotten pseg empty!\n");
db_enter();
/* panic? */
}
+ mtx_leave(&pv->pv_pmap->pm_mtx);
if (pv->pv_pmap->pm_ctx || pv->pv_pmap == pmap_kernel()) {
tsb_invalidate(pv->pv_pmap->pm_ctx,
(pv->pv_va & PV_VAMASK));
@@ -2541,13 +2566,9 @@ pmap_page_protect(struct vm_page *pg, vm
pv->pv_pmap->pm_ctx);
}
atomic_dec_long(&pv->pv_pmap->pm_stats.resident_count);
-
- KASSERT(pv->pv_next == NULL);
- /* dump the first pv */
- pv->pv_pmap = NULL;
}
+
dcache_flush_page(pa);
- mtx_leave(&pg->mdpage.pvmtx);
}
/* We should really only flush the pages we demapped. */
}
--
jca
sparc64 pmap (was: Re: Please test: parallel fault handling)