From: Crystal Kolipe Subject: Re: [PATCH] amd64: import optimized memcmp from FreeBSD] To: tech , Mateusz Guzik Date: Thu, 05 Jun 2025 05:56:22 -0300 On Wed, Jun 04, 2025 at 10:25:36AM +0100, Stuart Henderson wrote: > I just noticed that I've still got this diff in my tree and wondered > if it's going to go anywhere. > > The thread from last year got derailed into discussion about libc > (partly my fault) and SIMD (not likely to be really useful in the > kernel) - https://marc.info/?t=173284225400001&r=1&w=2 - but I don't > think there were any particular objections, and there were good test > reports, for the diff as presented. Reading the code it appears to do what it claims to do without introducing any new bugs. But if the existing code is going to be replaced at all, presumably we want to ensure that the replacement is the most appropriate, (in terms of trade off of performance and complexity), for the foreseeable future to avoid churn. On this point, I'm dubious. The lack of comments and numeric labels doesn't make the assessment any easier, because it isn't obvious whether any particular part of the code is intentionally done for performance reasons or whether it's an oversight. For example, there are two code paths that can deal with the case of having exactly 32 bytes left in the buffer. If you enter memcmp() with rdx==32, then the byte comparison is handled by the routine at 101632. If you enter memcmp() with rdx being a multiple of 32, then you end up spinning in 103200, because the cmpq $32,%rdx is followed by a jae, (jump above _or equal_), meaning it's testing rdx >= 32. So the last 32 bytes are compared by 103200 instead of 101632. If the 103200 routine is indeed preferred for performance, (and on what hardware?), then why is the initial cmpq in 101632 followed by a ja instead of a jae? I can see what the code _does_, but without comments who knows what it's _supposed_ to do?