From: Crystal Kolipe <kolipe.c@exoticsilicon.com>
Subject: Re: [PATCH] amd64: import optimized memcmp from FreeBSD]
To: tech <tech@openbsd.org>, Mateusz Guzik <mjguzik@gmail.com>
Date: Thu, 05 Jun 2025 05:56:22 -0300

On Wed, Jun 04, 2025 at 10:25:36AM +0100, Stuart Henderson wrote:
> I just noticed that I've still got this diff in my tree and wondered
> if it's going to go anywhere.
> 
> The thread from last year got derailed into discussion about libc
> (partly my fault) and SIMD (not likely to be really useful in the
> kernel) - https://marc.info/?t=173284225400001&r=1&w=2 - but I don't
> think there were any particular objections, and there were good test
> reports, for the diff as presented.

Reading the code it appears to do what it claims to do without introducing any
new bugs.

But if the existing code is going to be replaced at all, presumably we want to
ensure that the replacement is the most appropriate, (in terms of trade off of
performance and complexity), for the foreseeable future to avoid churn.

On this point, I'm dubious.  The lack of comments and numeric labels doesn't
make the assessment any easier, because it isn't obvious whether any
particular part of the code is intentionally done for performance reasons or
whether it's an oversight.

For example, there are two code paths that can deal with the case of having
exactly 32 bytes left in the buffer.

If you enter memcmp() with rdx==32, then the byte comparison is handled by
the routine at 101632.

If you enter memcmp() with rdx being a multiple of 32, then you end up
spinning in 103200, because the cmpq $32,%rdx is followed by a jae,
(jump above _or equal_), meaning it's testing rdx >= 32.

So the last 32 bytes are compared by 103200 instead of 101632.

If the 103200 routine is indeed preferred for performance, (and on what
hardware?), then why is the initial cmpq in 101632 followed by a ja instead of
a jae?

I can see what the code _does_, but without comments who knows what it's
_supposed_ to do?