Download raw body.
amd64: prefer enhanced REP MOVSB/STOSB feature if available
On Mon, Dec 22, 2025 at 01:23:18PM +0100, Martin Pieuchot wrote:
> As Mateusz Guzik pointed out recently [0] we can greatly reduce the
> amount of CPU cycles spent zeroing pages by using 'rep stosb'.
>
> Diff below does that, ok?
>
> [0] https://marc.info/?l=openbsd-tech&m=176631121132731&w=2
Not my area but I think one issue we have since the introduction of
__HAVE_UVM_PERCPU and struct uvm_pmr_cache is that the the system
no longer uses the pre-zeroed pages provided by the zerothread.
I think fixing that and giving the system a way to have a per-cpu
magazin of zeroed pages results in far less cycles wasted holding
critical locks.
> Index: arch/amd64/amd64/locore.S
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/amd64/locore.S,v
> diff -u -p -r1.151 locore.S
> --- arch/amd64/amd64/locore.S 2 Aug 2025 07:33:28 -0000 1.151
> +++ arch/amd64/amd64/locore.S 22 Dec 2025 11:54:32 -0000
> @@ -1172,6 +1172,16 @@ ENTRY(pagezero)
> lfence
> END(pagezero)
>
> +ENTRY(pagezero_erms)
> + RETGUARD_SETUP(pagezero_erms, r11)
> + movq $PAGE_SIZE,%rcx
> + xorq %rax,%rax
> + rep stosb
> + RETGUARD_CHECK(pagezero_erms, r11)
> + ret
> + lfence
> +END(pagezero_erms)
> +
> /* void pku_xonly(void) */
> ENTRY(pku_xonly)
> movq pg_xo,%rax /* have PKU support? */
> Index: arch/amd64/amd64/pmap.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/amd64/pmap.c,v
> diff -u -p -r1.182 pmap.c
> --- arch/amd64/amd64/pmap.c 15 Aug 2025 13:40:43 -0000 1.182
> +++ arch/amd64/amd64/pmap.c 22 Dec 2025 11:55:07 -0000
> @@ -1594,11 +1594,14 @@ pmap_extract(struct pmap *pmap, vaddr_t
> /*
> * pmap_zero_page: zero a page
> */
> -
> void
> pmap_zero_page(struct vm_page *pg)
> {
> - pagezero(pmap_map_direct(pg));
> + /* Prefer enhanced REP MOVSB/STOSB feature if available. */
> + if (ISSET(curcpu()->ci_feature_sefflags_ebx, SEFF0EBX_ERMS))
> + pagezero_erms(pmap_map_direct(pg));
> + else
> + pagezero(pmap_map_direct(pg));
> }
>
> /*
> Index: arch/amd64/include/pmap.h
> ===================================================================
> RCS file: /cvs/src/sys/arch/amd64/include/pmap.h,v
> diff -u -p -r1.94 pmap.h
> --- arch/amd64/include/pmap.h 7 Jul 2025 00:55:15 -0000 1.94
> +++ arch/amd64/include/pmap.h 22 Dec 2025 11:46:09 -0000
> @@ -403,6 +403,7 @@ void pmap_write_protect(struct pmap *,
> paddr_t pmap_prealloc_lowmem_ptps(paddr_t);
>
> void pagezero(vaddr_t);
> +void pagezero_erms(vaddr_t);
>
> void pmap_convert(struct pmap *, int);
> void pmap_enter_special(vaddr_t, paddr_t, vm_prot_t);
>
>
--
:wq Claudio
amd64: prefer enhanced REP MOVSB/STOSB feature if available