Index | Thread | Search

From:
Jeremie Courreges-Anglas <jca@wxcvbn.org>
Subject:
Re: sparc64 pmap (was: Re: Please test: parallel fault handling)
To:
Mark Kettenis <mark.kettenis@xs4all.nl>
Cc:
tech@openbsd.org, miod@online.fr
Date:
Tue, 27 May 2025 17:21:16 +0200

Download raw body.

Thread
  • Jeremie Courreges-Anglas:

    Please test: parallel fault handling

  • On Tue, May 27, 2025 at 03:26:29PM +0200, Mark Kettenis wrote:
    > > Date: Mon, 26 May 2025 13:38:39 +0200
    > > From: Jeremie Courreges-Anglas <jca@wxcvbn.org>
    > 
    > Hi Jeremie,
    > 
    > > On Mon, May 26, 2025 at 09:23:02AM +0200, Jeremie Courreges-Anglas wrote:
    > > > On Mon, May 26, 2025 at 12:04:59AM +0200, Mark Kettenis wrote:
    > > > > 
    > > > > Can you try the diff below.  I've nuked pmap_collect() on other
    > > > > architectures since it is hard to make "mpsafe".  The current sparc64
    > > > > certainly isn't.
    > > > > 
    > > > > If this fixes the crashes, I can see if it is possible to make it
    > > > > "mpsafe" on sparc64.
    > > > 
    > > > I see.  Indeed this function stood out when I looked at the file.  Can
    > > > I really just empty it out like this, or should I also drop
    > > > __HAVE_PMAP_COLLECT?
    > > > 
    > > > Yesterday the machine crashed again in a nearby place in
    > > > pmap_page_protect().
    > 
    > That is still without my pmap_collect() diff isn't it?
    
    Yep, it happened before you had sent the diff.
    
    > > > login: pmap_page_protect: gotten pseg empty!
    > > > Stopped at      pmap_page_protect+0x620:        nop
    > > > ddb{4}> tr
    > > > uvm_pagedeactivate(4000d1510a0, 19d2ce0, 193edd0, 40015a507e4, 1, 1) at uvm_pagedeactivate+0x54
    > > > uvn_flush(401053a3310, 0, 11c8000, 14, 1, 0) at uvn_flush+0x448
    > > > uvn_detach(401053a3310, 40101633630, 1, 0, 0, 1000000) at uvn_detach+0x158
    > > > uvm_unmap_detach(400fe6b5c68, 0, 9a7, 40015a507e4, 40015a507e4, 18f7828) at uvm_unmap_detach+0x68
    > > > uvm_map_teardown(400130912e0, 4000, 1939240, 4010599c000, 54d, 17b2ff8) at uvm_map_teardown+0x184
    > > > uvmspace_free(1cd7000, 400130912e0, 4, 17b2ff8, 0, 6) at uvmspace_free+0x64
    > > > reaper(40015a507e0, 40015a507e0, db, 100, 1c04038, 4000) at reaper+0x100
    > > > proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x10
    > > > ddb{4}>
    > > > 
    > > > This time it's pmap.c l.2534:
    > > > 
    > > >                 /* Then remove the primary pv */
    > > >                 if (pv->pv_pmap != NULL) {
    > > >                         data = pseg_get(pv->pv_pmap, pv->pv_va & PV_VAMASK);
    > > > 
    > > >                         /* Save REF/MOD info */
    > > >                         pv->pv_va |= pmap_tte2flags(data);
    > > >                         if (pseg_set(pv->pv_pmap, pv->pv_va & PV_VAMASK, 0, 0)) {
    > > >                                 printf("pmap_page_protect: gotten pseg empty!\n");
    > > > -->                             db_enter();
    > > >                                 /* panic? */
    > > >                         }
    > > > 
    > > > It's not the first crash with uvmspace_free() going through
    > > > uvm_pagedeactivate(), so I agree with you that a fix is probably
    > > > needed for mpi's uvm_purge() diff (which I haven't tested yet on that
    > > > box).
    > > 
    > > FWIW pmap_page_protect() itself looks like a potential problem, as you
    > > suspected.  The diff below locks the pmaps in pmap_page_protect(PROT_NONE)
    > > which is used by uvm_pagedeactivate.
    > 
    > I don't rule out issues in pmap_page_protect() yet.  However, this
    > could still be another manifestation of the issue with pmap_collect().
    > 
    > > Walking the pv list and locking the pmap introduces a lock ordering
    > > constraint, something that the arm64 and riscv64 implems handle with a
    > > slightly convoluted approach.  Surely sparc64 should have its
    > > pseg_get/set calls protected by the pmap lock too.  Now, sparc64 is
    > > special in several regards: each mapping must be checked for mod/ref
    > > information by accessing psegs, pv list handling is inlined, and the
    > > first element in the pv list is embedded in struct mdpage.
    > > 
    > > The diff below drops the current pv entry from the list before acting
    > > on it, thus avoiding concurrent access and the "start all over again"
    > > idiom.  For the "first" pv entry, we can make a copy of it so that we
    > > can drop the pv lock.  It could be argued that this static pv
    > > header/entry is making things more complex than they need to be.
    > > 
    > > The code keeps the mod/ref accounting.  But does it make sense to keep
    > > the mod/ref information at all, if we're flushing/destroying all the
    > > mappings to a page?  If it is indeed pointless, which I'm uncertain
    > > about, we could drop the pseg_get/mtx_enter/etc chunks.  The
    > > performance of the diff below is similar to before applying it (with
    > > the uvm parallel fault diff applied).
    > > 
    > > In this very function there's the "lower protection" code path that
    > > walks the pv list and access psegs.  Not sure yet how to tackle it,
    > > but maybe we don't need to handle it yet?
    > > 
    > > Thanks to miod for shedding some light on this pmap implementation
    > > over the week-end (he hasn't seen the diff yet).
    > > 
    > > Input welcome.
    > 
    > The pseg_get() and pseg_set() functions use atomic instructions to
    > manipulate the page tables, so they shouldn't need the pmap lock.  But
    > we can't remove pages from the page tables (which is what
    > pmap_collect() does), because then these function may read data from a
    > recycled page (or even modify it).  Note that this isn't just a
    > problem for pseg_get() and pseg_set(), but also for the "inline
    > pseg_get()" that is part of low-level fault handlers.
    
    ack, good point.  It guess it makes sense for the pmap_collect()
    problem to kick in while I build huge ports concurrently.  Since it
    gets called when other methods to reclaim memory aren't sufficient.
    I'll resume my tests when I get my hands back on this LDOM.
    
    > But there is a potential race with the REF/MOD bits that may need some
    > attenion.  I'l have another look at this diff soon.
    
    'k
    
    -- 
    jca
    
    
  • Jeremie Courreges-Anglas:

    Please test: parallel fault handling