From: Peter Hessler <phessler@theapt.org>
Subject: Re: Please test: parallel fault handling
To: tech@openbsd.org
Date: Thu, 14 Aug 2025 22:40:28 +0200

On 2025 Aug 14 (Thu) at 13:03:27 +0200 (+0200), Martin Pieuchot wrote:
:On 11/06/25(Wed) 13:14, Claudio Jeker wrote:
:> On Wed, Jun 11, 2025 at 12:12:58PM +0200, Mark Kettenis wrote:
:> > > Date: Mon, 9 Jun 2025 14:07:47 +0200
:> > > From: Claudio Jeker <cjeker@diehard.n-r-g.com>
:> > > 
:> > > On Mon, Jun 09, 2025 at 01:46:31PM +0200, Jeremie Courreges-Anglas wrote:
:> > > > On Tue, Jun 03, 2025 at 06:21:17PM +0200, Jeremie Courreges-Anglas wrote:
:> > > > > On Sun, May 25, 2025 at 11:20:46PM +0200, Jeremie Courreges-Anglas wrote:
:> > > > > > On Thu, May 22, 2025 at 08:19:38PM +0200, Mark Kettenis wrote:
:> > > > > > > > Date: Thu, 22 May 2025 18:54:08 +0200
:> > > > > > > > From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
:> > > > > > [...]
:> > > > > > > > *Bzzzt*
:> > > > > > > > 
:> > > > > > > > The same LDOM was busy compiling two devel/llvm copies under dpb(1).
:> > > > > > > > Input welcome, I'm not sure yet what other ddb commands could help.
:> > > > > > > > 
:> > > > > > > > login: panic: trap type 0x34 (mem address not aligned): pc=1012f68 npc=1012f6c pstate=820006<PRIV,IE>
:> > > > > > > > Stopped at      db_enter+0x8:   nop
:> > > > > > > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
:> > > > > > > >   57488   1522      0        0x11          0    1  perl
:> > > > > > > >  435923   9891     55   0x1000002          0    4  cc1plus
:> > > > > > > >  135860  36368     55   0x1000002          0   13  cc1plus
:> > > > > > > >  333743  96489     55   0x1000002          0    0  cc1plus
:> > > > > > > >  433162  55422     55   0x1000002          0    9  cc1plus
:> > > > > > > >  171658  49723     55   0x1000002          0    5  cc1plus
:> > > > > > > >   47127  57536     55   0x1000002          0   10  cc1plus
:> > > > > > > >   56600   9350     55   0x1000002          0   14  cc1plus
:> > > > > > > >  159792  13842     55   0x1000002          0    6  cc1plus
:> > > > > > > >  510019  10312     55   0x1000002          0    8  cc1plus
:> > > > > > > >   20489  65709     55   0x1000002          0   15  cc1plus
:> > > > > > > >  337455  42430     55   0x1000002          0   12  cc1plus
:> > > > > > > >  401407  80906     55   0x1000002          0   11  cc1plus
:> > > > > > > >   22993  62317     55   0x1000002          0    2  cc1plus
:> > > > > > > >  114916  17058     55   0x1000002          0    7  cc1plus
:> > > > > > > > *435412  33034      0     0x14000      0x200    3K pagedaemon
:> > > > > > > > trap(400fe6b19b0, 34, 1012f68, 820006, 3, 42) at trap+0x334
:> > > > > > > > Lslowtrap_reenter(40015a58a00, 77b5db2000, deadbeefdeadc0c7, 1d8, 2df0fc468, 468) at Lslowtrap_reenter+0xf8
:> > > > > > > > pmap_page_protect(40010716ab8, c16, 1cc9860, 193dfa0, 1cc9000, 1cc9000) at pmap_page_protect+0x1fc
:> > > > > > > > uvm_pagedeactivate(40010716a50, 40015a50d24, 18667a0, 0, 0, 1c8dac0) at uvm_pagedeactivate+0x54
:> > > > > > > > uvmpd_scan_active(0, 0, 270f2, 18667a0, 0, ffffffffffffffff) at uvmpd_scan_active+0x150
:> > > > > > > > uvm_pageout(400fe6b1e08, 55555556, 18667a0, 1c83f08, 1c83000, 1c8dc18) at uvm_pageout+0x2dc
:> > > > > > > > proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x10
:> > > > > > > > https://www.openbsd.org/ddb.html describes the minimum info required in bug
:> > > > > > > > reports.  Insufficient info makes it difficult to find and fix bugs.
:> > > > > > > 
:> > > > > > > If there are pmap issues, pmap_page_protect() is certainly the first
:> > > > > > > place I'd look.  I'll start looking, but don't expect to have much
:> > > > > > > time until after monday.
:> > > > > > 
:> > > > > > Indeed this crash lies in pmap_page_protect().  llvm-objdump -dlS says
:> > > > > > it's stopped at l.2499:
:> > > > > > 
:> > > > > >         } else {
:> > > > > >                 pv_entry_t firstpv;
:> > > > > >                 /* remove mappings */
:> > > > > > 
:> > > > > >                 firstpv = pa_to_pvh(pa);
:> > > > > >                 mtx_enter(&pg->mdpage.pvmtx);
:> > > > > > 
:> > > > > >                 /* First remove the entire list of continuation pv's*/
:> > > > > >                 while ((pv = firstpv->pv_next) != NULL) {
:> > > > > > -->                     data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
:> > > > > > 
:> > > > > >                         /* Save REF/MOD info */
:> > > > > >                         firstpv->pv_va |= pmap_tte2flags(data);
:> > > > > > 
:> > > > > > ; /sys/arch/sparc64/sparc64/pmap.c:2499
:> > > > > > ;                       data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
:> > > > > >     3c10: a7 29 30 0d   sllx %g4, 13, %l3
:> > > > > >     3c14: d2 5c 60 10   ldx [%l1+16], %o1
:> > > > > >     3c18: d0 5c 60 08   ldx [%l1+8], %o0
:> > > > > > --> 3c1c: 40 00 00 00   call 0
:> > > > > >     3c20: 92 0a 40 13   and %o1, %l3, %o1
:> > > > > > 
:> > > > > > As discussed with miod I suspect the crash actually lies inside
:> > > > > > pseg_get(), but I can't prove it.
:> > > > > 
:> > > > > Another similar crash, at the very same offset in pmap_page_protect,
:> > > > > with:
:> > > > > - pmap_collect() removed
:> > > > > - uvm_purge() applied
:> > > > > - uvm parallel fault applied
:> > > > 
:> > > > To try to reproduce this one, I went back to:
:> > > > - pmap_collect() applied
:> > > > - uvm_purge() backed out
:> > > > - uvm parallel fault applied
:> > > > - pmap_page_protect() simplification applied
:> > > > 
:> > > > In the parent mail in this thread I only dumped the first pv entry of
:> > > > the page.  Here we can see that the pmap of the second entry in the pv
:> > > > list appears corrupted.
:> > > > 
:> > > > This is relatively easy to reproduce for me, I just need to build rust
:> > > > and another big port in parallel to reproduce.  rust is a big user of
:> > > > threads.
:> > >  
:> > > I thought we already concluded that pmap_page_protect() is overly
:> > > optimistic and you had a diff to add extra locking to it.
:> > 
:> > While I have some doubts whether the atomic manipulation of the page
:> > tables correctly handles the tracking of the reference and
:> > modification bits, I do believe the locking (using the per-page mutex)
:> > is sufficient to prevent stale pmap references in the pv entries.  And
:> > I would really like to prevent the stupid lock dance that we do on
:> > other architectures.  But I must be missing something.
:> > 
:> > > I think the moment we do parallel uvm faults we run pmap_page_protect()
:> > > concurrent with some other pmap functions and get fireworks.
:> > 
:> > That would most likely be pmap_enter(). 
:>  
:> I will run with uvm parallel fault handling on my test sparc64 and see if
:> I can also hit the errors jca hit.
:
:Could I move forward by #ifndef'ing sparc64?  I'd appreciate if somebody
:could debug the pmap issue.  In the meantime I believe we should enable
:this on the other architectures.
:
:ok?
:

I've been running this on most of my workloads, including daily drivers
and build machines, since you posted this back in April.

tested on amd64, arm64, riscv64, octeon.

OK


:Index: uvm/uvm_fault.c
:===================================================================
:RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
:diff -u -p -r1.170 uvm_fault.c
:--- uvm/uvm_fault.c	14 Jul 2025 08:45:16 -0000	1.170
:+++ uvm/uvm_fault.c	14 Aug 2025 10:57:15 -0000
:@@ -662,7 +662,7 @@ uvm_fault(vm_map_t orig_map, vaddr_t vad
: 	flt.access_type = access_type;
: 	flt.narrow = FALSE;		/* assume normal fault for now */
: 	flt.wired = FALSE;		/* assume non-wired fault for now */
:-#if notyet
:+#ifndef __sparc64__
: 	flt.upper_lock_type = RW_READ;
: 	flt.lower_lock_type = RW_READ;	/* shared lock for now */
: #else
:
:

-- 
When I was a boy I was told that anybody could become President.
Now I'm beginning to believe it.
		-- Clarence Darrow