From: Steffen Nurpmeso Subject: Re: [PATCH] amd64: import optimized memcmp from FreeBSD To: Christian Schulte Cc: tech@openbsd.org Date: Mon, 02 Dec 2024 20:31:47 +0100 Christian Schulte wrote in : |On 11/29/24 17:08, Mateusz Guzik wrote: |> On Fri, Nov 29, 2024 at 4:49 PM Stuart Henderson \ |> wrote: |>> On 2024/11/29 02:01, Mateusz Guzik wrote: |>>> The rep-prefixed cmps is incredibly slow even on modern CPUs. |>>> |>>> The new implementation uses regular cmp to do it. |>>> |>>> The code got augmented to account for retguard, otherwise it matches \ |>>> FreeBSD. |>> |>> Would that make sense in libc too? |> |> this and the other routines are faster than what's in openbsd libc, |> but they are not the optimal choice due to lack of simd. definitely an |> improvement for the time being. | |This has been done before. E.g.: | | | |Removed nearly 13 years ago in favour of the .c version being faster |(gcc builtin back then), then readded based on NetBSD a couple of years |later. When reading Fyi, it originally came from DragonFly: https://marc.info/?l=dragonfly-commits&m=132241713812022&w=2 I remember i could not believe it, and tried it out. It really was "magic". (By then i was still "prowd" of my ASM stuff, having had spent surely hundreds of hours fiddling with optimizations.) And i do not remember falsely that it had nothing to do with gcc. I guess(ed) it was CPU internal stuff for "[cld;] repne scasb". ... --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) | |And in Fall, feel "The Dropbear Bard"s ball(s). | |The banded bear |without a care, |Banged on himself for e'er and e'er | |Farewell, dear collar bear