Index | Thread | Search

From:
Mark Kettenis <mark.kettenis@xs4all.nl>
Subject:
Re: wired pages vs sysctl(2)
To:
Martin Pieuchot <mpi@grenadille.net>
Cc:
tech@openbsd.org, kettenis@openbsd.org
Date:
Thu, 20 Mar 2025 22:57:14 +0100

Download raw body.

Thread
  • Martin Pieuchot:

    wired pages vs sysctl(2)

  • Alexander Bluhm:

    wired pages vs sysctl(2)

  • > Date: Wed, 12 Mar 2025 15:24:25 +0100
    > From: Martin Pieuchot <mpi@grenadille.net>
    > 
    > On 04/03/25(Tue) 14:45, Martin Pieuchot wrote:
    > > On 27/02/25(Thu) 14:44, Martin Pieuchot wrote:
    > > > Since the 90s sysctl(2) has been wiring pages that contain the syscall
    > > > arguments. 
    > > > 
    > > > In uvm/uvm_fault.c r1.129 kettenis@ fixed one of the races exposed by
    > > > syzkaller where a sibling thread munmap(2)ed the range currently wired
    > > > by sysctl(2).
    > > > 
    > > > Sadly the fix is incomplete and the reproducer attached triggers the
    > > > following panic:
    > > > 
    > > > panic: uvm_fault_unwire_locked: address not in map
    > > > Stopped at      db_enter+0x14:  popq    %rbp
    > > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
    > > > *129832   2744      0         0x3          0    0  sysctl_wired
    > > >   84116  96377      0        0x10        0x8    2  sshd-session
    > > >    6897  50184      0     0x14000      0x200    1  softnet0
    > > > db_enter() at db_enter+0x14
    > > > panic(ffffffff8261d529) at panic+0xdd
    > > > uvm_fault_unwire_locked(fffffd878156ab58,20000000,20001000) at uvm_fault_unwire
    > > > _locked+0x246
    > > > uvm_fault_unwire(fffffd878156ab58,20000000,20001000) at uvm_fault_unwire+0x44
    > > > sysctl_vsunlock(20000040,0) at sysctl_vsunlock+0x46
    > > > net_sysctl(ffff800055d411b4,3,20000040,ffff800055d411e8,20000080,c,cf641d03c835
    > > > bbeb) at net_sysctl+0x309
    > > > sys_sysctl(ffff800055c99c78,ffff800055d41300,ffff800055d41270) at sys_sysctl+0x
    > > > 1f8
    > > > syscall(ffff800055d41300) at syscall+0x5ec
    > > > Xsyscall() at Xsyscall+0x128
    > > >  
    > > > Syzkaller has two reports for this:
    > > >   https://syzkaller.appspot.com/bug?extid=d9b926edfd5f64a66c58
    > > >   https://syzkaller.appspot.com/bug?extid=a12540517e3a76a6a490
    > > > 
    > > > It is not clear to my why vslock(9) is needed today.  That said the diff
    > > > below completes kettenis@'s approach by making vslock similar to mlock(2).
    > > > With it, vsunlock() now silently fails without crashing the kernel and I
    > > > can no longer reproduce the panic.  
    > > 
    > > Note that there's another issue with the logic that wires pages during
    > > sysctl(2).  If a page has been previously mlock(2)ed and we pass it as
    > > arguments to sysctl(2), it will be munlock(2)ed when returning to
    > > userland.
    > 
    > What I said is not true.  The page will still be wired because map
    > entries are properly refcounted in uvm_map_pageable_wire().
    > 
    > With this diff the race left is the possibility to unwire a page wired
    > by the kernel via sysctl(2).
    > 
    > > As long as we need vslock(9), which is something that I don't know, might
    > > want to distinguish between userland vs kernel wiring like FreeBSD do.
    > 
    > I'm not so sure this is strictly needed.  If thread A calls sysctl(2)
    > and thread B does munlock(2) on the same page, then the output of
    > sysctl(2) might not be coherent.  I believe this race is acceptable.
    > 
    > Here's the diff again, so ok?
    
    So because uvm_map_pageable() locks the map, the race that I fixed can
    no longer happen.  And therefore the changes to uvm_fault_wire() made
    in r1.128 of uvm_faul.c can be reverted.  Can you mention that in the
    commit message?
    
    In the long run I think we should try to avoid uvm_vslock(), so making
    it a bit more expensive shouldn't be a major issue.
    
    ok kettenis@
    
    > Index: uvm/uvm_fault.c
    > ===================================================================
    > RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
    > diff -u -p -r1.165 uvm_fault.c
    > --- uvm/uvm_fault.c	10 Mar 2025 14:13:58 -0000	1.165
    > +++ uvm/uvm_fault.c	12 Mar 2025 14:18:22 -0000
    > @@ -1737,16 +1737,12 @@ uvm_fault_unwire_locked(vm_map_t map, va
    >  		 * find the map entry for the current address.
    >  		 */
    >  		KASSERT(va >= entry->start);
    > -		while (entry && va >= entry->end) {
    > +		while (va >= entry->end) {
    >  			next = RBT_NEXT(uvm_map_addr, entry);
    > +			KASSERT(next != NULL && next->start <= entry->end);
    >  			entry = next;
    >  		}
    >  
    > -		if (entry == NULL)
    > -			return;
    > -		if (va < entry->start)
    > -			continue;
    > -
    >  		/*
    >  		 * lock it.
    >  		 */
    > @@ -1776,7 +1772,7 @@ uvm_fault_unwire_locked(vm_map_t map, va
    >  	}
    >  
    >  	if (oentry != NULL) {
    > -		uvm_map_unlock_entry(oentry);
    > +		uvm_map_unlock_entry(entry);
    >  	}
    >  }
    >  
    > Index: uvm/uvm_glue.c
    > ===================================================================
    > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v
    > diff -u -p -r1.87 uvm_glue.c
    > --- uvm/uvm_glue.c	28 Oct 2024 08:25:32 -0000	1.87
    > +++ uvm/uvm_glue.c	12 Mar 2025 14:18:22 -0000
    > @@ -114,7 +114,7 @@ uvm_vslock(struct proc *p, caddr_t addr,
    >  	if (end <= start)
    >  		return (EINVAL);
    >  
    > -	return uvm_fault_wire(map, start, end, access_type);
    > +	return uvm_map_pageable(map, start, end, FALSE, 0);
    >  }
    >  
    >  /*
    > @@ -125,13 +125,14 @@ uvm_vslock(struct proc *p, caddr_t addr,
    >  void
    >  uvm_vsunlock(struct proc *p, caddr_t addr, size_t len)
    >  {
    > +	struct vm_map *map = &p->p_vmspace->vm_map;
    >  	vaddr_t start, end;
    >  
    >  	start = trunc_page((vaddr_t)addr);
    >  	end = round_page((vaddr_t)addr + len);
    >  	KASSERT(end > start);
    >  
    > -	uvm_fault_unwire(&p->p_vmspace->vm_map, start, end);
    > +	uvm_map_pageable(map, start, end, TRUE, 0);
    >  }
    >  
    >  /*
    > 
    > 
    > 
    
    
  • Martin Pieuchot:

    wired pages vs sysctl(2)

  • Alexander Bluhm:

    wired pages vs sysctl(2)