Index | Thread | Search

From:
Mark Kettenis <mark.kettenis@xs4all.nl>
Subject:
Re: clang: enable blake3 asm optimizations on amd64?
To:
Stuart Henderson <stu@spacehopper.org>
Cc:
naddy@mips.inka.de, tech@openbsd.org
Date:
Sun, 18 Feb 2024 11:37:54 +0100

Download raw body.

Thread
> Date: Sun, 18 Feb 2024 10:31:43 +0000
> From: Stuart Henderson <stu@spacehopper.org>
> 
> On 2024/02/18 10:53, Mark Kettenis wrote:
> > > Date: Sun, 18 Feb 2024 09:25:38 +0000
> > > From: Stuart Henderson <stu@spacehopper.org>
> > > 
> > > On 2024/02/17 20:59, Christian Weisgerber wrote:
> > > > From some cursory grepping, I don't think BLAKE3 is used much if
> > > > at all in clang.  So I don't know if this buys us anything.  It
> > > > does not reduce the time required by clang to compile itself.
> > > 
> > > Any idea how to make sure that this code is exercised?
> > > 
> > > > +.if ${MACHINE_ARCH} == "amd64"
> > > > +SRCS+=	blake3_sse2_x86-64_unix.S \
> > > > +	blake3_sse41_x86-64_unix.S \
> > > > +	blake3_avx2_x86-64_unix.S \
> > > > +	blake3_avx512_x86-64_unix.S
> > > > +.endif
> > > 
> > > I have a suspicion that AVX512 might not work with OpenBSD.
> > > We've had a diagnosed problem with this in node, and now a suspected
> > > one in GHC.
> > 
> > AVX512 doesn't work with OpenBSD.  Code is supposed to check whether
> > AVX512 has been enabled by the OS (in addition to checking the CPUID
> > bits).  But I wouldn't be surprised if there is code out there that
> > just checks the CPUID bits and just assumes the OS has enabled support
> > if they're present.
> 
> Ah, here was the commit that fixed node:
> 
> https://github.com/simdutf/simdutf/pull/243/commits/d00dc3cae3d22880cad7f3e60892f729a264ad2d
> 
> src/gnu/llvm/llvm/lib/Support/BLAKE3/blake3_dispatch.c does look like it
> has a similar check (using xgetbv) so I think there's a good chance it
> will work correctly.

Yes, using xgetbv() is the right approach.

I actually have AVX512 hardware now, so I'm looking at implementing
proper support for it.