Index | Thread | Search

From:
Christian Schulte <cs@schulte.it>
Subject:
Re: [PATCH] amd64: import optimized memcmp from FreeBSD
To:
tech@openbsd.org
Date:
Mon, 2 Dec 2024 03:21:09 +0100

Download raw body.

Thread
On 11/30/24 15:59, Christian Weisgerber wrote:
> Christian Schulte:
> 
>> lately I thought about rewriting those functions to use simd, but never
>> looked further into doing it, because I do not know if simd
>> registers/instructions could be used there at all. Could they? Instead
> 
> Presumably not in the kernel as we would have to save/restore
> FPU/SIMD registers around those calls.
> 
>> of working on the same thing every now and then, I would be in favour of
>> providing routines using the most performant instructions the
>> architecture has to offer (simd), if possible, and be done with it.
>> Would not mind writing those, if they would be accepted.
> 
> FreeBSD already did all that work recently.
> https://freebsdfoundation.org/blog/a-sneak-peek-simd-enhanced-string-functions-for-amd64/
> 

Do you know if they have published some test data somewhere? Something
which can be used to compare results. I got curious about it (again) and
instead of directly jumping onto the assembly train, I first wanted to
find out what the compiler will produce when writing a memcmp function
in C. Compared to the current memcmp in base a C memcmp compiled with
-O2 will make the compiler use SIMD instructions. I am only testing it
when equal so that the whole length has to be scanned.

I tested various len values (0,1,2,3,4,5,6,7,8...). Either I am a total
idiot or the C version really outperforms the assembly version. That's
for a len == 517.

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 72.1     220.33   220.33 1000000000     0.00     0.00  _libc_memcmp [3]
 17.9     274.98    54.65 1000000000     0.00     0.00  c_memcmp [4]

And that's for a len == 13.

 23.6      20.67     9.04 1000000000     0.00     0.00  _libc_memcmp [4]
 20.9      28.67     8.00 1000000000     0.00     0.00  c_memcmp [5]

This can't hardly be, can it?

-- 
Christian