From: Christian Schulte Subject: Re: [PATCH] amd64: import optimized memcmp from FreeBSD To: tech@openbsd.org Date: Mon, 2 Dec 2024 03:21:09 +0100 On 11/30/24 15:59, Christian Weisgerber wrote: > Christian Schulte: > >> lately I thought about rewriting those functions to use simd, but never >> looked further into doing it, because I do not know if simd >> registers/instructions could be used there at all. Could they? Instead > > Presumably not in the kernel as we would have to save/restore > FPU/SIMD registers around those calls. > >> of working on the same thing every now and then, I would be in favour of >> providing routines using the most performant instructions the >> architecture has to offer (simd), if possible, and be done with it. >> Would not mind writing those, if they would be accepted. > > FreeBSD already did all that work recently. > https://freebsdfoundation.org/blog/a-sneak-peek-simd-enhanced-string-functions-for-amd64/ > Do you know if they have published some test data somewhere? Something which can be used to compare results. I got curious about it (again) and instead of directly jumping onto the assembly train, I first wanted to find out what the compiler will produce when writing a memcmp function in C. Compared to the current memcmp in base a C memcmp compiled with -O2 will make the compiler use SIMD instructions. I am only testing it when equal so that the whole length has to be scanned. I tested various len values (0,1,2,3,4,5,6,7,8...). Either I am a total idiot or the C version really outperforms the assembly version. That's for a len == 517. % cumulative self self total time seconds seconds calls ms/call ms/call name 72.1 220.33 220.33 1000000000 0.00 0.00 _libc_memcmp [3] 17.9 274.98 54.65 1000000000 0.00 0.00 c_memcmp [4] And that's for a len == 13. 23.6 20.67 9.04 1000000000 0.00 0.00 _libc_memcmp [4] 20.9 28.67 8.00 1000000000 0.00 0.00 c_memcmp [5] This can't hardly be, can it? -- Christian