From: Claudio Jeker Subject: Re: Please test: parallel fault handling To: Mark Kettenis , tech@openbsd.org Date: Mon, 18 Aug 2025 14:02:32 +0200 On Thu, Aug 14, 2025 at 01:03:27PM +0200, Martin Pieuchot wrote: > On 11/06/25(Wed) 13:14, Claudio Jeker wrote: > > On Wed, Jun 11, 2025 at 12:12:58PM +0200, Mark Kettenis wrote: > > > > Date: Mon, 9 Jun 2025 14:07:47 +0200 > > > > From: Claudio Jeker > > > > > > > > On Mon, Jun 09, 2025 at 01:46:31PM +0200, Jeremie Courreges-Anglas wrote: > > > > > On Tue, Jun 03, 2025 at 06:21:17PM +0200, Jeremie Courreges-Anglas wrote: > > > > > > On Sun, May 25, 2025 at 11:20:46PM +0200, Jeremie Courreges-Anglas wrote: > > > > > > > On Thu, May 22, 2025 at 08:19:38PM +0200, Mark Kettenis wrote: > > > > > > > > > Date: Thu, 22 May 2025 18:54:08 +0200 > > > > > > > > > From: Jeremie Courreges-Anglas > > > > > > > [...] > > > > > > > > > *Bzzzt* > > > > > > > > > > > > > > > > > > The same LDOM was busy compiling two devel/llvm copies under dpb(1). > > > > > > > > > Input welcome, I'm not sure yet what other ddb commands could help. > > > > > > > > > > > > > > > > > > login: panic: trap type 0x34 (mem address not aligned): pc=1012f68 npc=1012f6c pstate=820006 > > > > > > > > > Stopped at db_enter+0x8: nop > > > > > > > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > > > > > > > 57488 1522 0 0x11 0 1 perl > > > > > > > > > 435923 9891 55 0x1000002 0 4 cc1plus > > > > > > > > > 135860 36368 55 0x1000002 0 13 cc1plus > > > > > > > > > 333743 96489 55 0x1000002 0 0 cc1plus > > > > > > > > > 433162 55422 55 0x1000002 0 9 cc1plus > > > > > > > > > 171658 49723 55 0x1000002 0 5 cc1plus > > > > > > > > > 47127 57536 55 0x1000002 0 10 cc1plus > > > > > > > > > 56600 9350 55 0x1000002 0 14 cc1plus > > > > > > > > > 159792 13842 55 0x1000002 0 6 cc1plus > > > > > > > > > 510019 10312 55 0x1000002 0 8 cc1plus > > > > > > > > > 20489 65709 55 0x1000002 0 15 cc1plus > > > > > > > > > 337455 42430 55 0x1000002 0 12 cc1plus > > > > > > > > > 401407 80906 55 0x1000002 0 11 cc1plus > > > > > > > > > 22993 62317 55 0x1000002 0 2 cc1plus > > > > > > > > > 114916 17058 55 0x1000002 0 7 cc1plus > > > > > > > > > *435412 33034 0 0x14000 0x200 3K pagedaemon > > > > > > > > > trap(400fe6b19b0, 34, 1012f68, 820006, 3, 42) at trap+0x334 > > > > > > > > > Lslowtrap_reenter(40015a58a00, 77b5db2000, deadbeefdeadc0c7, 1d8, 2df0fc468, 468) at Lslowtrap_reenter+0xf8 > > > > > > > > > pmap_page_protect(40010716ab8, c16, 1cc9860, 193dfa0, 1cc9000, 1cc9000) at pmap_page_protect+0x1fc > > > > > > > > > uvm_pagedeactivate(40010716a50, 40015a50d24, 18667a0, 0, 0, 1c8dac0) at uvm_pagedeactivate+0x54 > > > > > > > > > uvmpd_scan_active(0, 0, 270f2, 18667a0, 0, ffffffffffffffff) at uvmpd_scan_active+0x150 > > > > > > > > > uvm_pageout(400fe6b1e08, 55555556, 18667a0, 1c83f08, 1c83000, 1c8dc18) at uvm_pageout+0x2dc > > > > > > > > > proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x10 > > > > > > > > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > > > > > > > > reports. Insufficient info makes it difficult to find and fix bugs. > > > > > > > > > > > > > > > > If there are pmap issues, pmap_page_protect() is certainly the first > > > > > > > > place I'd look. I'll start looking, but don't expect to have much > > > > > > > > time until after monday. > > > > > > > > > > > > > > Indeed this crash lies in pmap_page_protect(). llvm-objdump -dlS says > > > > > > > it's stopped at l.2499: > > > > > > > > > > > > > > } else { > > > > > > > pv_entry_t firstpv; > > > > > > > /* remove mappings */ > > > > > > > > > > > > > > firstpv = pa_to_pvh(pa); > > > > > > > mtx_enter(&pg->mdpage.pvmtx); > > > > > > > > > > > > > > /* First remove the entire list of continuation pv's*/ > > > > > > > while ((pv = firstpv->pv_next) != NULL) { > > > > > > > --> data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK); > > > > > > > > > > > > > > /* Save REF/MOD info */ > > > > > > > firstpv->pv_va |= pmap_tte2flags(data); > > > > > > > > > > > > > > ; /sys/arch/sparc64/sparc64/pmap.c:2499 > > > > > > > ; data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK); > > > > > > > 3c10: a7 29 30 0d sllx %g4, 13, %l3 > > > > > > > 3c14: d2 5c 60 10 ldx [%l1+16], %o1 > > > > > > > 3c18: d0 5c 60 08 ldx [%l1+8], %o0 > > > > > > > --> 3c1c: 40 00 00 00 call 0 > > > > > > > 3c20: 92 0a 40 13 and %o1, %l3, %o1 > > > > > > > > > > > > > > As discussed with miod I suspect the crash actually lies inside > > > > > > > pseg_get(), but I can't prove it. > > > > > > > > > > > > Another similar crash, at the very same offset in pmap_page_protect, > > > > > > with: > > > > > > - pmap_collect() removed > > > > > > - uvm_purge() applied > > > > > > - uvm parallel fault applied > > > > > > > > > > To try to reproduce this one, I went back to: > > > > > - pmap_collect() applied > > > > > - uvm_purge() backed out > > > > > - uvm parallel fault applied > > > > > - pmap_page_protect() simplification applied > > > > > > > > > > In the parent mail in this thread I only dumped the first pv entry of > > > > > the page. Here we can see that the pmap of the second entry in the pv > > > > > list appears corrupted. > > > > > > > > > > This is relatively easy to reproduce for me, I just need to build rust > > > > > and another big port in parallel to reproduce. rust is a big user of > > > > > threads. > > > > > > > > I thought we already concluded that pmap_page_protect() is overly > > > > optimistic and you had a diff to add extra locking to it. > > > > > > While I have some doubts whether the atomic manipulation of the page > > > tables correctly handles the tracking of the reference and > > > modification bits, I do believe the locking (using the per-page mutex) > > > is sufficient to prevent stale pmap references in the pv entries. And > > > I would really like to prevent the stupid lock dance that we do on > > > other architectures. But I must be missing something. > > > > > > > I think the moment we do parallel uvm faults we run pmap_page_protect() > > > > concurrent with some other pmap functions and get fireworks. > > > > > > That would most likely be pmap_enter(). > > > > I will run with uvm parallel fault handling on my test sparc64 and see if > > I can also hit the errors jca hit. > > Could I move forward by #ifndef'ing sparc64? I'd appreciate if somebody > could debug the pmap issue. In the meantime I believe we should enable > this on the other architectures. > > ok? I'm very much unsure about this. At least from my understanding enabling this causes Mac M1 and M2 machines to lock up and require dlg's parking mutex diff to work. Also there is at the bug report on bugs@ about a dual socket amd64 system wich hangs using nfdump which very much points again at some pmap / ipi issue. Yes, we can run a while with the enabled to get more feedback but I'm not sure we have the time to find all the issues before release. Especially on any arch apart from amd64 and arm64. > Index: uvm/uvm_fault.c > =================================================================== > RCS file: /cvs/src/sys/uvm/uvm_fault.c,v > diff -u -p -r1.170 uvm_fault.c > --- uvm/uvm_fault.c 14 Jul 2025 08:45:16 -0000 1.170 > +++ uvm/uvm_fault.c 14 Aug 2025 10:57:15 -0000 > @@ -662,7 +662,7 @@ uvm_fault(vm_map_t orig_map, vaddr_t vad > flt.access_type = access_type; > flt.narrow = FALSE; /* assume normal fault for now */ > flt.wired = FALSE; /* assume non-wired fault for now */ > -#if notyet > +#ifndef __sparc64__ > flt.upper_lock_type = RW_READ; > flt.lower_lock_type = RW_READ; /* shared lock for now */ > #else > > -- :wq Claudio