From: Martin Pieuchot Subject: Re: wired pages vs sysctl(2) To: Mark Kettenis Cc: tech@openbsd.org, kettenis@openbsd.org Date: Fri, 21 Mar 2025 12:17:02 +0100 On 20/03/25(Thu) 22:57, Mark Kettenis wrote: > > Date: Wed, 12 Mar 2025 15:24:25 +0100 > > From: Martin Pieuchot > > > > On 04/03/25(Tue) 14:45, Martin Pieuchot wrote: > > > On 27/02/25(Thu) 14:44, Martin Pieuchot wrote: > > > > Since the 90s sysctl(2) has been wiring pages that contain the syscall > > > > arguments. > > > > > > > > In uvm/uvm_fault.c r1.129 kettenis@ fixed one of the races exposed by > > > > syzkaller where a sibling thread munmap(2)ed the range currently wired > > > > by sysctl(2). > > > > > > > > Sadly the fix is incomplete and the reproducer attached triggers the > > > > following panic: > > > > > > > > panic: uvm_fault_unwire_locked: address not in map > > > > Stopped at db_enter+0x14: popq %rbp > > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > > *129832 2744 0 0x3 0 0 sysctl_wired > > > > 84116 96377 0 0x10 0x8 2 sshd-session > > > > 6897 50184 0 0x14000 0x200 1 softnet0 > > > > db_enter() at db_enter+0x14 > > > > panic(ffffffff8261d529) at panic+0xdd > > > > uvm_fault_unwire_locked(fffffd878156ab58,20000000,20001000) at uvm_fault_unwire > > > > _locked+0x246 > > > > uvm_fault_unwire(fffffd878156ab58,20000000,20001000) at uvm_fault_unwire+0x44 > > > > sysctl_vsunlock(20000040,0) at sysctl_vsunlock+0x46 > > > > net_sysctl(ffff800055d411b4,3,20000040,ffff800055d411e8,20000080,c,cf641d03c835 > > > > bbeb) at net_sysctl+0x309 > > > > sys_sysctl(ffff800055c99c78,ffff800055d41300,ffff800055d41270) at sys_sysctl+0x > > > > 1f8 > > > > syscall(ffff800055d41300) at syscall+0x5ec > > > > Xsyscall() at Xsyscall+0x128 > > > > > > > > Syzkaller has two reports for this: > > > > https://syzkaller.appspot.com/bug?extid=d9b926edfd5f64a66c58 > > > > https://syzkaller.appspot.com/bug?extid=a12540517e3a76a6a490 > > > > > > > > It is not clear to my why vslock(9) is needed today. That said the diff > > > > below completes kettenis@'s approach by making vslock similar to mlock(2). > > > > With it, vsunlock() now silently fails without crashing the kernel and I > > > > can no longer reproduce the panic. > > > > > > Note that there's another issue with the logic that wires pages during > > > sysctl(2). If a page has been previously mlock(2)ed and we pass it as > > > arguments to sysctl(2), it will be munlock(2)ed when returning to > > > userland. > > > > What I said is not true. The page will still be wired because map > > entries are properly refcounted in uvm_map_pageable_wire(). > > > > With this diff the race left is the possibility to unwire a page wired > > by the kernel via sysctl(2). > > > > > As long as we need vslock(9), which is something that I don't know, might > > > want to distinguish between userland vs kernel wiring like FreeBSD do. > > > > I'm not so sure this is strictly needed. If thread A calls sysctl(2) > > and thread B does munlock(2) on the same page, then the output of > > sysctl(2) might not be coherent. I believe this race is acceptable. > > > > Here's the diff again, so ok? > > So because uvm_map_pageable() locks the map, the race that I fixed can > no longer happen. And therefore the changes to uvm_fault_wire() made > in r1.128 of uvm_faul.c can be reverted. Can you mention that in the > commit message? I'm not sure to understand. How is that related to the locking? I though the race that you fixed was due to the fact that uvm_fault_unwire() expects previously wired entries to always be there whereas uvm_map_pageable() skip them. Is it something different? > > In the long run I think we should try to avoid uvm_vslock(), so making > it a bit more expensive shouldn't be a major issue. > > ok kettenis@ Thanks! > > Index: uvm/uvm_fault.c > > =================================================================== > > RCS file: /cvs/src/sys/uvm/uvm_fault.c,v > > diff -u -p -r1.165 uvm_fault.c > > --- uvm/uvm_fault.c 10 Mar 2025 14:13:58 -0000 1.165 > > +++ uvm/uvm_fault.c 12 Mar 2025 14:18:22 -0000 > > @@ -1737,16 +1737,12 @@ uvm_fault_unwire_locked(vm_map_t map, va > > * find the map entry for the current address. > > */ > > KASSERT(va >= entry->start); > > - while (entry && va >= entry->end) { > > + while (va >= entry->end) { > > next = RBT_NEXT(uvm_map_addr, entry); > > + KASSERT(next != NULL && next->start <= entry->end); > > entry = next; > > } > > > > - if (entry == NULL) > > - return; > > - if (va < entry->start) > > - continue; > > - > > /* > > * lock it. > > */ > > @@ -1776,7 +1772,7 @@ uvm_fault_unwire_locked(vm_map_t map, va > > } > > > > if (oentry != NULL) { > > - uvm_map_unlock_entry(oentry); > > + uvm_map_unlock_entry(entry); > > } > > } > > > > Index: uvm/uvm_glue.c > > =================================================================== > > RCS file: /cvs/src/sys/uvm/uvm_glue.c,v > > diff -u -p -r1.87 uvm_glue.c > > --- uvm/uvm_glue.c 28 Oct 2024 08:25:32 -0000 1.87 > > +++ uvm/uvm_glue.c 12 Mar 2025 14:18:22 -0000 > > @@ -114,7 +114,7 @@ uvm_vslock(struct proc *p, caddr_t addr, > > if (end <= start) > > return (EINVAL); > > > > - return uvm_fault_wire(map, start, end, access_type); > > + return uvm_map_pageable(map, start, end, FALSE, 0); > > } > > > > /* > > @@ -125,13 +125,14 @@ uvm_vslock(struct proc *p, caddr_t addr, > > void > > uvm_vsunlock(struct proc *p, caddr_t addr, size_t len) > > { > > + struct vm_map *map = &p->p_vmspace->vm_map; > > vaddr_t start, end; > > > > start = trunc_page((vaddr_t)addr); > > end = round_page((vaddr_t)addr + len); > > KASSERT(end > start); > > > > - uvm_fault_unwire(&p->p_vmspace->vm_map, start, end); > > + uvm_map_pageable(map, start, end, TRUE, 0); > > } > > > > /* > > > > > >