Comment by neonsunset

8 months ago

The code there is written in a fairly auto-vectorizeable way. But the actual capabilities of Go's compiler are very far away from this despite public expectation (and autovectorization is brittle, writing inference or training in a way that relies on it is the last thing you want). To put it in perspective, until 2021 Go was always passing the data on the stack on function calls. It has improved since then but the overall design aims to ensure common scenarios are fast (e.g. comparisons against string literals are unrolled) but once you venture outside that or if it's an optimization that requires more compiler complexity - Go is far less likely to employ it.

> and autovectorization is brittle, writing inference or training in a way that relies on it is the last thing you want)

I'm curious if you could speak more to this? Is the concern that operations may get reordered?

> To put it in perspective, until 2021 Go was always passing the data on the stack on function calls. It has improved since then but the overall design aims to ensure common scenarios are fast (e.g. comparisons against string literals are unrolled) but once you venture outside that or if it's an optimization that requires more compiler complexity - Go is far less likely to employ it.

I agree with this assesment.

The individual operations in the repository (e.g., dot product) look like they could be autovectorized. I'm assuming they aren't because of the use of a slice. I'm mildly curious if it could be massaged into something autovectorized.

Most of my observations re: autovectorization in go have been on fixed sized vectors and matrices where SSE2 instructions are pretty readily available and loop unrolling is pretty simple.

I'm curious what it would produce with the matrix in a single slice rather than independent allocations. Not curious enough to start poking at it, just curious enough to ramble about it conversationally.

  • > The individual operations in the repository (e.g., dot product) look like they could be autovectorized. I'm assuming they aren't because of the use of a slice. I'm mildly curious if it could be massaged into something autovectorized.

    > Most of my observations re: autovectorization in go have been on fixed sized vectors and matrices where SSE2 instructions are pretty readily available and loop unrolling is pretty simple.

    Go does not have any form of autovectorization. The only way to access SIMD instructions in Go is through functions written in Goasm. Moreover, Go does not ship SIMD primitives in its math library which would not necessitate auto-vectorization by implementing inlineable functions with SIMD instructions instead.

    > I'm curious if you could speak more to this? Is the concern that operations may get reordered?

    Autovectorization brittleness is a large topic. Analysis is expensive, vectorization may be impossible due to violating program order or observable side effects. In addition to that it often needs multiple expensive optimization phases coupled with complex compiler IR and back-ends to efficiently target multiple platforms which does not fit well with Go's compiler design (at least such is my amateur impression from looking at its source code).

    Go's compiler should not be treated as if it's in the same class with GCC or LLVM because it is anything but, it is a grade below .NET's RyuJIT/ILC and OpenJDK's HotSpot, with design decisions and practices that make Go a somewhat easier optimization target than .NET CIL which allows it to maintain relative parity at general-purpose code light on abstractions (if it is heavy on those, Go starts to fall behind).

    • Your message applies to one particular Go compiler from Google. But since you mention gcc and llvm, it is also possible to use them to compile Go. Each implementation has different trade-offs in quality of generated code, runtime and language features.

      4 replies →