Comment by carefree-bob

17 days ago

The performance gains from bit-level control over memory come from managing the layout to ensure cache locality and do things like SIMD - and nowadays even GPU kernel offload. Enormous performance gains.

I agree that it really isn't about garbage collection pauses, but I haven't heard people focusing on "eliminating gc pause" when they talk about low level languages, but they spend a lot of time talking about SIMD, GPU kernels, and cache misses. If Go could add these features, it would be a performance monster.

4 comments

carefree-bob

nasretdinov 17 days ago

Go defines structs the same way C does, so it's already encouraging thinking about and optimising the physical data layout. It also recently added experimental support for SIMD intristics: https://go.dev/doc/go1.26#simd . Nothing on GPU side yet though, but I wouldn't be surprised to see it there eventually too :)

carefree-bob 17 days ago
Yes, I know Go's structs are similar to C in terms of syntax, but does the Go compiler guarantee the same bitwise layout for its data structures? Most GC languages add metadata to the data structures to track GC status, and this changes both the memory layout and the word alignment, which then sometimes forces the language to add extra padding to maintain alignment. And this nests as you put one struct inside another, or an array inside a struct.
Now you have "fat arrays" and "fat structs", so instead of grabbing a pointer and loading the next 128 bits into memory and doing an operation, you have to grab the pointer, read out data from individual elements, combine them, create a new element with the combined data, and then you have a 128 bits. But even then, you don't know whether you have 128 bits or not. Some gc-specific metadata might have been added by the compiler (and probably was).
Bottom line, it's very hard in the GC world to have bit-wise control over memory layout, even if user-level syntax of "structs" is the same. And one consequence of that is that you can't just "do" SIMD in Go. You have to wait for Go to expose a library that does this for you, and you will always be limited by what types of unpacking/repacking the language designers allowed you to do.
Or, you are stuck with hoping the compiler is very smart, which is never the case and requires huge compile times for marginal gains in compiler smarts.
So it's not about GC collection pauses so much as no longer having access to memory layouts.
- nasretdinov 17 days ago
  
  > does the Go compiler guarantee the same bitwise layout for its data structures
  It probably won't be fully 1:1 with C, but it's good enough that you can write code like this and it works: https://github.com/fsnotify/fsnotify/blob/main/backend_inoti... (unix.InotifyEvent is just a Go struct: https://pkg.go.dev/golang.org/x/sys/unix#InotifyEvent)
  > Now you have "fat arrays" and "fat structs", so instead of grabbing a pointer and loading the next 128 bits into memory and doing an operation, you have to grab the pointer, read out data from individual elements, combine them, create a new element with the combined data, and then you have a 128 bits.
  That is not how it works, you get real pointers that you can even do math with using unsafe package.
  > Most GC languages add metadata to the data structures to track GC status, and this changes both the memory layout and the word alignment, which then sometimes forces the language to add extra padding to maintain alignment
  Go GC uses a separate memory region to track GC metadata. It does not embed this information into structs, arrays, etc, directly.
  > And one consequence of that is that you can't just "do" SIMD in Go. You have to wait for Go to expose a library that does this for you, and you will always be limited by what types of unpacking/repacking the language designers allowed you to do.
  You very much could, thanks to what I described above. You'll have to write assembly (Go supports assembly), and it's even used in some e.g. crypto libraries not just for performance reasons, but to ensure constany-time operation too.
  The downside of using assembly is that it doesn't support inlining, and there's a small shim to keep ABI backwards compatible with the original way functions were called (using stack, whereas newer ABI uses registers). So you need to write loops in assembly too to eliminate the function call overhead. The SIMD package solves this issue by allowing code inlining.
  
  1 reply →