From: Martin Pieuchot Subject: Re: uvm_purge() To: Mark Kettenis Cc: tech@openbsd.org Date: Wed, 14 May 2025 11:55:32 +0200 On 13/05/25(Tue) 22:31, Mark Kettenis wrote: > > Date: Tue, 13 May 2025 14:21:44 +0200 > > From: Martin Pieuchot > > > > On 12/05/25(Mon) 22:00, Mark Kettenis wrote: > > > > Date: Mon, 12 May 2025 20:42:40 +0200 > > > > From: Martin Pieuchot > > > > > > Hi Martin, > > > > > > > Diff below moves the tearing down of VM space to exit1(). It implies > > > > that processes will now be charged for cleaning their VM space. As a > > > > result their %sys time will increase. > > > > > > > > uvm_purge() is called in the "top" part of exit1() when the process is > > > > still allowed to sleep. To allow this piece of code to be executed in > > > > parallel, all recursive kernel_lock tokens need to be released. > > > > > > > > In the diff below uvm_purge() is called twice. The first time in > > > > exit1() and the second time, doing nothing but integrity checks, in > > > > uvm_exit(). Another approach could be to just check that the map is > > > > empty. Any preferences? > > > > > > > > Comments? Oks? > > > > > > As I explained to Claudio in the "Faster _exit(2) for a faster > > > userland: R.I.P the reaper" thread, this will disable the TLB flushing > > > optimization I added on arm64 a few months ago. The uvm_purge() call > > > needs to come after the userland page tables have been disabled and > > > the TLBs have been flushed. > > > > > > Unfortunately the way I hooked that operation into pmap_deactivate() > > > makes this non-trivial. The easiest way out may be to add a new > > > pmap_purge() that does this operation and that gets called from an > > > appropriate spot exit1(). This must happen after the other threads of > > > the process have exited, so somewhere after the point where the main > > > threads waits for them to die. > > > > As long as this only touches userland page tables it can be inside > > uvm_purge(). > > > > > I can cook you a diff for that. > > > > Thanks! > > > > So here is a proof of concept; not sure I'm entirely happy with this > yet. But hopefully it gets across what's the challenge. Lovely, more inputs below that hopefully can help simplify this proof of concept. > > It is important that when we flush the TLB, none of the threads in a > process have the userland page tables active. On arm64 the CPUs can > speculatively load TLB entries even if you don't reference the pages! > The current code deactivates the page tables in cpu_exit() and uses > atomics to make sure that the last thread that goes through cpu_exit() > also flushes the TLB. At that point none of the threads can sleep, so > we can simply set the TTBR0_EL1 register to point at a page filled > with zeroes and don't have to worry about a context switch resetting > TTBR0_EL1 to point at the userland page tables again. (We re-use the > page filled with zeroes from the kernel pmap for that.) > > But uvm_purge() can sleep, so it needs to be called much earlier. We > can prevent a context switch from reloading TTBR0_EL1 by also setting > pm->pm_pt0a to point at that page filled with zeroes. But if we do > that for any non-main thread, we run into problems because another > thread that is still running userland code might context switch and > end up in an endless loop faulting because it has a page table without > valied entries in it. > > So that's why my new pmap_exit() function gets called in different > places for the main thread and other threads. The main thread calls > pmap_exit() at the place where your diff calls uvm_purge(), so it > could be rolled into that function. > > I think this strategy will work for other pmaps as well, but I > probably should look at one or two other ones. uvm_purge() is executed by the last thread in a process. When this happens the other threads might still be at the end of exit1() but none of them will go back to userland. I have other diffs to improve the synchronization between the siblings of a process when exiting, mostly to remove unnecessary context switches. They are built on the current assumption that uvm_purge() is called when all other threads have cleaned their states. This part of my work removes the notion of 'main thread' and the P_THREAD flag. Instead the last thread of a process to enter exit1() will clean the per-process states. Could you use those two pieces of information to simplify your diff? > Index: arch/arm64/arm64/pmap.c > =================================================================== > RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v > diff -u -p -r1.111 pmap.c > --- arch/arm64/arm64/pmap.c 14 Feb 2025 18:36:04 -0000 1.111 > +++ arch/arm64/arm64/pmap.c 13 May 2025 20:09:45 -0000 > @@ -946,6 +946,7 @@ pmap_pinit(pmap_t pm) > pmap_extract(pmap_kernel(), l0va, (paddr_t *)&pm->pm_pt0pa); > > pmap_reference(pm); > + pm->pm_active = 1; > } > > int pmap_vp_poolcache = 0; /* force vp poolcache to allocate late */ > @@ -1482,7 +1483,6 @@ pmap_activate(struct proc *p) > { > pmap_t pm = p->p_vmspace->vm_map.pmap; > > - atomic_inc_int(&pm->pm_active); > if (p == curproc && pm != curcpu()->ci_curpm) > pmap_setttb(p); > } > @@ -1493,6 +1493,11 @@ pmap_activate(struct proc *p) > void > pmap_deactivate(struct proc *p) > { > +} > + > +void > +pmap_exit(struct proc *p) > +{ > pmap_t pm = p->p_vmspace->vm_map.pmap; > > KASSERT(p == curproc); > @@ -1500,11 +1505,15 @@ pmap_deactivate(struct proc *p) > WRITE_SPECIALREG(ttbr0_el1, pmap_kernel()->pm_pt0pa); > __asm volatile("isb"); > > - if (atomic_dec_int_nv(&pm->pm_active) > 0) > + if ((p->p_flag & P_THREAD)) > return; > > + KASSERT(p->p_p->ps_threadcnt == 0); > + > cpu_tlb_flush_asid_all((uint64_t)pm->pm_asid << 48); > cpu_tlb_flush_asid_all((uint64_t)(pm->pm_asid | ASID_USER) << 48); > + pm->pm_pt0pa = pmap_kernel()->pm_pt0pa; > + pm->pm_active = 0; > } > > /* > Index: arch/arm64/include/pmap.h > =================================================================== > RCS file: /cvs/src/sys/arch/arm64/include/pmap.h,v > diff -u -p -r1.28 pmap.h > --- arch/arm64/include/pmap.h 3 Feb 2025 17:59:40 -0000 1.28 > +++ arch/arm64/include/pmap.h 13 May 2025 20:09:45 -0000 > @@ -125,6 +125,8 @@ int pmap_fault_fixup(pmap_t, vaddr_t, vm > #define __HAVE_PMAP_MPSAFE_ENTER_COW > #define __HAVE_PMAP_POPULATE > > +void pmap_exit(struct proc *); > + > #endif /* _KERNEL && !_LOCORE */ > > #ifndef _LOCORE > Index: kern/kern_exit.c > =================================================================== > RCS file: /cvs/src/sys/kern/kern_exit.c,v > diff -u -p -r1.245 kern_exit.c > --- kern/kern_exit.c 2 May 2025 05:04:38 -0000 1.245 > +++ kern/kern_exit.c 13 May 2025 20:09:48 -0000 > @@ -241,6 +241,8 @@ exit1(struct proc *p, int xexit, int xsi > */ > if (pr->ps_pptr->ps_sigacts->ps_sigflags & SAS_NOCLDWAIT) > atomic_setbits_int(&pr->ps_flags, PS_NOZOMBIE); > + > + pmap_exit(p); > } > > p->p_fd = NULL; /* zap the thread's copy */ > @@ -369,6 +371,7 @@ exit1(struct proc *p, int xexit, int xsi > > /* just a thread? check if last one standing. */ > if (p->p_flag & P_THREAD) { > + pmap_exit(p); > /* scheduler_wait_hook(pr->ps_mainproc, p); XXX */ > mtx_enter(&pr->ps_mtx); > pr->ps_exitcnt--;