From: Mark Kettenis Subject: Re: amd64 pmap & resident_count To: Martin Pieuchot Cc: tech@openbsd.org Date: Wed, 28 Aug 2024 15:23:22 +0200 > Date: Wed, 28 Aug 2024 09:57:39 +0200 > From: Martin Pieuchot > > On 27/08/24(Tue) 22:43, Mark Kettenis wrote: > > > Date: Fri, 23 Aug 2024 16:19:37 +0200 > > > From: Martin Pieuchot > > > > > > On a 24 CPUs amd64 machine building a kernel with -j24 always result in > > > a corrupted "RES" `pmap_kernel()->pm_stats.resident_count' view in top(1). > > > > > > Modifications to this field are (mostly) serialized by the pmap's mutex. > > > Sadly pmap_enter() do not grab the mutex for the kernel's pmap. Diff > > > below changes that and fix the issue for me. I believe it is easier and > > > cheaper than using atomics everywhere. > > > > > > ok? > > > > Not sure this is a good idea. Not grabbing the pmap mutex for the > > kernel pmap was somewhat by design; I think at some point I convinced > > myself that doing so could lead to deadlocks. > > I'm not convinced. The splvm() in pmap_growkernel() is not enough to > protect from multiple callers. That is a different problem though. It is perfectly fine to call pmap_enter() or pmap_remove() for the kernel map on one CPU while another CPU is in pmap_growkernel(). We just have to make sure that two CPUs don't race eachother in pmap_growkernel(). > > So yes, the statistics for the kernel pmap may end up corrupted. That > > shouldn't be a problem though, since these statistics are pretty much > > meaningless for the kernel pmap. I don't even think our various > > architectures do the accounting in a consistent way. > > I want to keep them. We have too few counters to see what's happening > in the kernel pmap. What do you think these counters count? Only a small part of kernel memeory is "managed", and I don't think any of it is pageable/swappable anymore. > > So maybe we should just not bother with these for the kernel pmap? > > And rather than wrapping each of the pm_stats.resident_count and > > pm_stats.wiredcount with a an > > > > if (pmap != pmap_kernel()) > > > > we should just change the few places where we actually use/report the > > counts to report zero for the kernel map? > > No, I want to fix them. That's what the last diff in the thread does. > > > > Index: arch/amd64/amd64/pmap.c > > > =================================================================== > > > RCS file: /cvs/src/sys/arch/amd64/amd64/pmap.c,v > > > diff -u -p -r1.171 pmap.c > > > --- arch/amd64/amd64/pmap.c 8 Aug 2024 15:57:22 -0000 1.171 > > > +++ arch/amd64/amd64/pmap.c 23 Aug 2024 14:13:46 -0000 > > > @@ -416,10 +416,6 @@ pmap_map_ptes(struct pmap *pmap) > > > > > > KASSERT(pmap->pm_type != PMAP_TYPE_EPT); > > > > > > - /* the kernel's pmap is always accessible */ > > > - if (pmap == pmap_kernel()) > > > - return 0; > > > - > > > /* > > > * Lock the target map before switching to its page tables to > > > * guarantee other CPUs have finished changing the tables before > > > @@ -427,6 +423,10 @@ pmap_map_ptes(struct pmap *pmap) > > > */ > > > mtx_enter(&pmap->pm_mtx); > > > > > > + /* the kernel's pmap is always accessible */ > > > + if (pmap == pmap_kernel()) > > > + return 0; > > > + > > > cr3 = rcr3(); > > > KASSERT((cr3 & CR3_PCID) == PCID_KERN || > > > (cr3 & CR3_PCID) == PCID_PROC); > > > @@ -443,8 +443,7 @@ pmap_map_ptes(struct pmap *pmap) > > > void > > > pmap_unmap_ptes(struct pmap *pmap, paddr_t save_cr3) > > > { > > > - if (pmap != pmap_kernel()) > > > - mtx_leave(&pmap->pm_mtx); > > > + mtx_leave(&pmap->pm_mtx); > > > > > > if (save_cr3 != 0) > > > lcr3(save_cr3); > > > > > > >