← Back to context

Comment by dundarious

1 year ago

What does Zig offer in the way of builtin SIMD support, beyond overloads for trivial arithmetic operations? 90% of the utility of SIMD is outside of those types of simple operations. I like Zig, but my understanding is you have to reach for CPU specific builtins for the vast majority of cases, just like in C/C++.

GCC and Clang support the vector_size attribute and overloaded arithmetic operators on those "vectorized" types, and a LOT more besides -- in fact, that's how intrinsics like _mm256_mul_ps are implemented: `#define _mm256_mul_ps(a,b) (__m256)((v8sf)(a) * (v8sf)(b))`. The utility of all of that is much, much greater than what's available in Zig.

Zig ships LLVM's internal generic SIMD stuff, which is fairly common for newish systems languages. If you want dynamic shuffles or even moderately exotic things like maddubs or aesenc then you need to use LLVM intrinsics for specific instructions or asm.

I’m also wondering what “built in” even means. Many have SIMD, Vector, Matrix, Quaternions and the like as part of the standard library, but not necessarily as their own keywords. C#/.NET, Java has SIMD by this metric.

  • Java's Panama Vectors are work in progress and are far from being competitive with .NET's implementation of SIMD abstractions, which is mostly on par with Zig, Swift and Mojo.

    You can usually port existing SIMD algorithms from C/C++/Rust to C# with few changes retaining the same performance, and it's practically impossible to do so in Java.

    I feel like C veterans often don't realize how unnecessarily ceremonious platform-specific SIMD code is given the progress in portable abstractions. Unless you need an exotic instruction that does not translate across architectures and/or common patterns nicely, there is little reason to have a bespoke platform-specific path.

    • I'm primarily writing "general-purpose" code (especially parsers and formatters) rather than code that does the same math operation on a big array, so it's usually not reasonable to even use the same approach to the problem with different vector extensions :(

      1 reply →