← Back to context

Comment by chrsig

8 months ago

> and autovectorization is brittle, writing inference or training in a way that relies on it is the last thing you want)

I'm curious if you could speak more to this? Is the concern that operations may get reordered?

> To put it in perspective, until 2021 Go was always passing the data on the stack on function calls. It has improved since then but the overall design aims to ensure common scenarios are fast (e.g. comparisons against string literals are unrolled) but once you venture outside that or if it's an optimization that requires more compiler complexity - Go is far less likely to employ it.

I agree with this assesment.

The individual operations in the repository (e.g., dot product) look like they could be autovectorized. I'm assuming they aren't because of the use of a slice. I'm mildly curious if it could be massaged into something autovectorized.

Most of my observations re: autovectorization in go have been on fixed sized vectors and matrices where SSE2 instructions are pretty readily available and loop unrolling is pretty simple.

I'm curious what it would produce with the matrix in a single slice rather than independent allocations. Not curious enough to start poking at it, just curious enough to ramble about it conversationally.

> The individual operations in the repository (e.g., dot product) look like they could be autovectorized. I'm assuming they aren't because of the use of a slice. I'm mildly curious if it could be massaged into something autovectorized.

> Most of my observations re: autovectorization in go have been on fixed sized vectors and matrices where SSE2 instructions are pretty readily available and loop unrolling is pretty simple.

Go does not have any form of autovectorization. The only way to access SIMD instructions in Go is through functions written in Goasm. Moreover, Go does not ship SIMD primitives in its math library which would not necessitate auto-vectorization by implementing inlineable functions with SIMD instructions instead.

> I'm curious if you could speak more to this? Is the concern that operations may get reordered?

Autovectorization brittleness is a large topic. Analysis is expensive, vectorization may be impossible due to violating program order or observable side effects. In addition to that it often needs multiple expensive optimization phases coupled with complex compiler IR and back-ends to efficiently target multiple platforms which does not fit well with Go's compiler design (at least such is my amateur impression from looking at its source code).

Go's compiler should not be treated as if it's in the same class with GCC or LLVM because it is anything but, it is a grade below .NET's RyuJIT/ILC and OpenJDK's HotSpot, with design decisions and practices that make Go a somewhat easier optimization target than .NET CIL which allows it to maintain relative parity at general-purpose code light on abstractions (if it is heavy on those, Go starts to fall behind).

  • Your message applies to one particular Go compiler from Google. But since you mention gcc and llvm, it is also possible to use them to compile Go. Each implementation has different trade-offs in quality of generated code, runtime and language features.

    • Okay, I heard this argument enough times to know it's unreasonable but feel free to prove me wrong :)

      We have this go-attention library which seems like a perfect candidate for an alternate compiler. How do I get Go compiled to reasonably good, autovectorized result here?

      3 replies →