From: Peter Hessler Subject: Re: Please test: parallel fault handling To: tech@openbsd.org Date: Thu, 14 Aug 2025 22:40:28 +0200 On 2025 Aug 14 (Thu) at 13:03:27 +0200 (+0200), Martin Pieuchot wrote: :On 11/06/25(Wed) 13:14, Claudio Jeker wrote: :> On Wed, Jun 11, 2025 at 12:12:58PM +0200, Mark Kettenis wrote: :> > > Date: Mon, 9 Jun 2025 14:07:47 +0200 :> > > From: Claudio Jeker :> > > :> > > On Mon, Jun 09, 2025 at 01:46:31PM +0200, Jeremie Courreges-Anglas wrote: :> > > > On Tue, Jun 03, 2025 at 06:21:17PM +0200, Jeremie Courreges-Anglas wrote: :> > > > > On Sun, May 25, 2025 at 11:20:46PM +0200, Jeremie Courreges-Anglas wrote: :> > > > > > On Thu, May 22, 2025 at 08:19:38PM +0200, Mark Kettenis wrote: :> > > > > > > > Date: Thu, 22 May 2025 18:54:08 +0200 :> > > > > > > > From: Jeremie Courreges-Anglas :> > > > > > [...] :> > > > > > > > *Bzzzt* :> > > > > > > > :> > > > > > > > The same LDOM was busy compiling two devel/llvm copies under dpb(1). :> > > > > > > > Input welcome, I'm not sure yet what other ddb commands could help. :> > > > > > > > :> > > > > > > > login: panic: trap type 0x34 (mem address not aligned): pc=1012f68 npc=1012f6c pstate=820006 :> > > > > > > > Stopped at db_enter+0x8: nop :> > > > > > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND :> > > > > > > > 57488 1522 0 0x11 0 1 perl :> > > > > > > > 435923 9891 55 0x1000002 0 4 cc1plus :> > > > > > > > 135860 36368 55 0x1000002 0 13 cc1plus :> > > > > > > > 333743 96489 55 0x1000002 0 0 cc1plus :> > > > > > > > 433162 55422 55 0x1000002 0 9 cc1plus :> > > > > > > > 171658 49723 55 0x1000002 0 5 cc1plus :> > > > > > > > 47127 57536 55 0x1000002 0 10 cc1plus :> > > > > > > > 56600 9350 55 0x1000002 0 14 cc1plus :> > > > > > > > 159792 13842 55 0x1000002 0 6 cc1plus :> > > > > > > > 510019 10312 55 0x1000002 0 8 cc1plus :> > > > > > > > 20489 65709 55 0x1000002 0 15 cc1plus :> > > > > > > > 337455 42430 55 0x1000002 0 12 cc1plus :> > > > > > > > 401407 80906 55 0x1000002 0 11 cc1plus :> > > > > > > > 22993 62317 55 0x1000002 0 2 cc1plus :> > > > > > > > 114916 17058 55 0x1000002 0 7 cc1plus :> > > > > > > > *435412 33034 0 0x14000 0x200 3K pagedaemon :> > > > > > > > trap(400fe6b19b0, 34, 1012f68, 820006, 3, 42) at trap+0x334 :> > > > > > > > Lslowtrap_reenter(40015a58a00, 77b5db2000, deadbeefdeadc0c7, 1d8, 2df0fc468, 468) at Lslowtrap_reenter+0xf8 :> > > > > > > > pmap_page_protect(40010716ab8, c16, 1cc9860, 193dfa0, 1cc9000, 1cc9000) at pmap_page_protect+0x1fc :> > > > > > > > uvm_pagedeactivate(40010716a50, 40015a50d24, 18667a0, 0, 0, 1c8dac0) at uvm_pagedeactivate+0x54 :> > > > > > > > uvmpd_scan_active(0, 0, 270f2, 18667a0, 0, ffffffffffffffff) at uvmpd_scan_active+0x150 :> > > > > > > > uvm_pageout(400fe6b1e08, 55555556, 18667a0, 1c83f08, 1c83000, 1c8dc18) at uvm_pageout+0x2dc :> > > > > > > > proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x10 :> > > > > > > > https://www.openbsd.org/ddb.html describes the minimum info required in bug :> > > > > > > > reports. Insufficient info makes it difficult to find and fix bugs. :> > > > > > > :> > > > > > > If there are pmap issues, pmap_page_protect() is certainly the first :> > > > > > > place I'd look. I'll start looking, but don't expect to have much :> > > > > > > time until after monday. :> > > > > > :> > > > > > Indeed this crash lies in pmap_page_protect(). llvm-objdump -dlS says :> > > > > > it's stopped at l.2499: :> > > > > > :> > > > > > } else { :> > > > > > pv_entry_t firstpv; :> > > > > > /* remove mappings */ :> > > > > > :> > > > > > firstpv = pa_to_pvh(pa); :> > > > > > mtx_enter(&pg->mdpage.pvmtx); :> > > > > > :> > > > > > /* First remove the entire list of continuation pv's*/ :> > > > > > while ((pv = firstpv->pv_next) != NULL) { :> > > > > > --> data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK); :> > > > > > :> > > > > > /* Save REF/MOD info */ :> > > > > > firstpv->pv_va |= pmap_tte2flags(data); :> > > > > > :> > > > > > ; /sys/arch/sparc64/sparc64/pmap.c:2499 :> > > > > > ; data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK); :> > > > > > 3c10: a7 29 30 0d sllx %g4, 13, %l3 :> > > > > > 3c14: d2 5c 60 10 ldx [%l1+16], %o1 :> > > > > > 3c18: d0 5c 60 08 ldx [%l1+8], %o0 :> > > > > > --> 3c1c: 40 00 00 00 call 0 :> > > > > > 3c20: 92 0a 40 13 and %o1, %l3, %o1 :> > > > > > :> > > > > > As discussed with miod I suspect the crash actually lies inside :> > > > > > pseg_get(), but I can't prove it. :> > > > > :> > > > > Another similar crash, at the very same offset in pmap_page_protect, :> > > > > with: :> > > > > - pmap_collect() removed :> > > > > - uvm_purge() applied :> > > > > - uvm parallel fault applied :> > > > :> > > > To try to reproduce this one, I went back to: :> > > > - pmap_collect() applied :> > > > - uvm_purge() backed out :> > > > - uvm parallel fault applied :> > > > - pmap_page_protect() simplification applied :> > > > :> > > > In the parent mail in this thread I only dumped the first pv entry of :> > > > the page. Here we can see that the pmap of the second entry in the pv :> > > > list appears corrupted. :> > > > :> > > > This is relatively easy to reproduce for me, I just need to build rust :> > > > and another big port in parallel to reproduce. rust is a big user of :> > > > threads. :> > > :> > > I thought we already concluded that pmap_page_protect() is overly :> > > optimistic and you had a diff to add extra locking to it. :> > :> > While I have some doubts whether the atomic manipulation of the page :> > tables correctly handles the tracking of the reference and :> > modification bits, I do believe the locking (using the per-page mutex) :> > is sufficient to prevent stale pmap references in the pv entries. And :> > I would really like to prevent the stupid lock dance that we do on :> > other architectures. But I must be missing something. :> > :> > > I think the moment we do parallel uvm faults we run pmap_page_protect() :> > > concurrent with some other pmap functions and get fireworks. :> > :> > That would most likely be pmap_enter(). :> :> I will run with uvm parallel fault handling on my test sparc64 and see if :> I can also hit the errors jca hit. : :Could I move forward by #ifndef'ing sparc64? I'd appreciate if somebody :could debug the pmap issue. In the meantime I believe we should enable :this on the other architectures. : :ok? : I've been running this on most of my workloads, including daily drivers and build machines, since you posted this back in April. tested on amd64, arm64, riscv64, octeon. OK :Index: uvm/uvm_fault.c :=================================================================== :RCS file: /cvs/src/sys/uvm/uvm_fault.c,v :diff -u -p -r1.170 uvm_fault.c :--- uvm/uvm_fault.c 14 Jul 2025 08:45:16 -0000 1.170 :+++ uvm/uvm_fault.c 14 Aug 2025 10:57:15 -0000 :@@ -662,7 +662,7 @@ uvm_fault(vm_map_t orig_map, vaddr_t vad : flt.access_type = access_type; : flt.narrow = FALSE; /* assume normal fault for now */ : flt.wired = FALSE; /* assume non-wired fault for now */ :-#if notyet :+#ifndef __sparc64__ : flt.upper_lock_type = RW_READ; : flt.lower_lock_type = RW_READ; /* shared lock for now */ : #else : : -- When I was a boy I was told that anybody could become President. Now I'm beginning to believe it. -- Clarence Darrow