Index | Thread | Search

From:
Mark Kettenis <mark.kettenis@xs4all.nl>
Subject:
Re: km_alloc(9), UAREA and kv_pageable
To:
mpi@grenadille.net, miod@openbsd.org
Cc:
tech@openbsd.org
Date:
Tue, 22 Oct 2024 23:15:52 +0200

Download raw body.

Thread
> Date: Sun, 20 Oct 2024 21:42:02 +0200
> From: Mark Kettenis <mark.kettenis@xs4all.nl>
> 
> > Date: Sun, 20 Oct 2024 19:47:23 +0200
> > From: Martin Pieuchot <mpi@grenadille.net>
> > 
> > On 20/10/24(Sun) 14:36, Mark Kettenis wrote:
> > > > Date: Sun, 20 Oct 2024 11:37:57 +0200
> > > > From: Martin Pieuchot <mpi@grenadille.net>
> > > 
> > > Sorry, I didn't have time yet to investigate.  But I'd rather not
> > > change the km_alloc() API if it is not necessary, especially when the
> > > it introduces a somewhat vague concept like "managed" that doesn't
> > > even make sense in the context where it gets used.
> > > 
> > > > On 10/10/24(Thu) 10:40, Martin Pieuchot wrote:
> > > > > On 09/10/24(Wed) 13:39, Mark Kettenis wrote:
> > > > > > > Date: Wed, 9 Oct 2024 13:16:25 +0200
> > > > > > > From: Martin Pieuchot <mpi@grenadille.net>
> > > > > > > 
> > > > > > > On 02/10/24(Wed) 17:28, Martin Pieuchot wrote:
> > > > > > > > It is currently impossible to use km_alloc(9) to get a managed mapping
> > > > > > > > backed with physical pages allocated up front.  I guess this is why
> > > > > > > > uvm_km_kmemalloc_pla() is still used to allocate the UAREA.
> > > > > > > > 
> > > > > > > > To fix that, I suggest the diff below which turns the "kp_pageable" flag
> > > > > > > > of km_alloc(9) into a "kp_managed".  This new flag no longer implies
> > > > > > > > "kp_nomem" so I updated the description and fixed the remaining XXX to
> > > > > > > > preserve existing behavior of the `kp_pageable' global.
> > > > > > > > 
> > > > > > > > With this change I believe we could also get rid of uvm_km_zalloc() used
> > > > > > > > in the i386 pmap.
> > > > > > > > 
> > > > > > > > Comments?  Oks?
> > > > > > > 
> > > > > > > Anyone?
> > > > > > 
> > > > > > I don't really understand why this needs to be "managed" memory.  We
> > > > > > stopped paging out the uarea ages ago, and I wonder if all this
> > > > > > complexity is just from the time when we did.  Can't this just be
> > > > > > kp_zero memory?
> > > > > 
> > > > > On amd64 using kp_zero results in the kernel faulting in loop from which
> > > > > it cannot recover.  Could you please look at it and tell me if there is a
> > > > > bug somewhere else?
> > > 
> > > Since the memory is no longer "managed", setguardpage() needs to use
> > > pmap_kremove() instead of pmap_remove().
> > 
> > Nice found!
> > 
> > > The following diff works for me on both amd64 and arm64.
> > 
> > Lovely, should we test this on more architectures?  Or are you confident
> > with it as it is?
> 
> Looks like amd64 is the only architecture that does that guardpage
> thing.  So I think this is fine.

However, I think this diff (that is committed now) leaks a page for
each proc that we create.  This didn't happen with the old code
because the kernel object kept track of the pages that we removed
using pmap_remove(), and those pages would be free when the kernel
stack was unmapped using uvm_unmap().

This no longer happens when we use km_alloc(), so we have to free that
page ourselves.

ok?

P.S. I have a diff that moves this code to MI code that I'm currently
     testing.  But this fix should go in first.


Index: arch/amd64/amd64/vm_machdep.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/vm_machdep.c,v
diff -u -p -r1.48 vm_machdep.c
--- arch/amd64/amd64/vm_machdep.c	21 Oct 2024 18:27:34 -0000	1.48
+++ arch/amd64/amd64/vm_machdep.c	22 Oct 2024 21:05:16 -0000
@@ -135,8 +135,15 @@ cpu_exit(struct proc *p)
 void
 setguardpage(struct proc *p)
 {
+	struct vm_page *pg = NULL;
+	paddr_t pa;
+
+	if (pmap_extract(pmap_kernel(), (vaddr_t)p->p_addr + PAGE_SIZE, &pa))
+		pg = PHYS_TO_VM_PAGE(pa);
 	pmap_kremove((vaddr_t)p->p_addr + PAGE_SIZE, PAGE_SIZE);
 	pmap_update(pmap_kernel());
+	if (pg)
+		uvm_pagefree(pg);
 }
 
 struct kmem_va_mode kv_physwait = {