Comment by alexozer
9 months ago
So am I identifying the bottlenecks that motivate this design correctly?
1. Go FFI is slow
2. Per-proto generated code specialization is slow, because of icache pressure
I know there's more to the optimization story here, but I guess these are the primary motivations for the VM over just better code generation or implementing a parser in non-Go?
I know that Java resisted improving their FFI for years because they preferred that the JIT get the extra resources. And that customers not bail out of Java every time they couldn’t figure out how to make it faster. There’s a case I recall from when HotSpot was still young, where the Java GUI team moved part of the graphics pipeline to the FFI in one release, hotspot got faster in the next, and then they rolled back the changes because it was now faster without the FFI.
But eventually your compiler is good enough that the FFI Is now your bottleneck, and you need to do something.
3. The use case is dynamic schemas and access is through the reflection API. Thus PGO has to be done at runtime...
I keep hearing that Go's C FFI is slow, why is that? How much slower is it in comparison to other languages?
Go's goroutines aren't plain C threads (blocking syscalls are magically made async), and Go's stack isn't a normal C stack (it's tiny and grown dynamically).
A C function won't know how to behave in Go's runtime environment, so to call a C function Go needs make itself look more like a C program, call the C function, and then restore its magic state.
Other languages like C++, Rust, and Swift are similar enough to C that they can just call C functions directly. CPython is a C program, so it can too. Golang was brave enough to do fundamental things its own way, which isn't quite C-compatible.
> CPython is a C program
Go (gc) was also a C program originally. It still had the same overhead back then as it does now. The implementation language is immaterial. How things are implemented is what is significant. Go (tinygo), being a different implementation, can call C functions as fast as C can.
> ...so it can too.
In my experience, the C FFI overhead in CPython is significantly higher than Go (gc). How are you managing to avoid it?
3 replies →
I wonder if they should be using something like libuv to handle this. Instead of flipping state back and forth, create a playground for the C code that looks more like what it expects.
What about languages like Java, or other popular languages with GC?
5 replies →
Go's threading model involves a lot of tiny (but growable) stacks and calling C functions almost immediately stack overflows.
Calling C safely is then slow because you have to allocate a larger stack, copy data around and mess with the GC.
> How much slower is it in comparison to other languages?
It's about the same as most other languages that aren't specifically optimized for C calling. Considerably faster than Python.
Which is funny as everyone on HN loves to extol the virtues of Python being a "C DSL" and never think twice about its overhead, but as soon as the word Go is mentioned its like your computer is going to catch fire if you even try.
Emotion-driven development is a bizarre world.
Yeah, that is why I am asking.
I've asked ChatGPT to summarize (granted my prompt might not be ideal), but some points to note, here just first in details others in the link at the bottom:
... This was only the first issue, but then it follows with "Go runtime can't see inside C to know is it allocating, blocking, spinning, etc.", then "Stack switching", "Thread Affinity and TLS", "Debug/Profiling support overhead", "Memory Ownership and GC barriers"
All here - https://chatgpt.com/share/688172c3-9fa4-800a-9b8f-e1252b57d0...
Just to roll with your way: https://chatgpt.com/share/688177c9-ebc0-8011-88cc-9514d8e167...
Please do not take the numbers below at face value. I still expect an actual reply to my initial comment.
Per-call overhead:
Seems like Go is still fast enough as opposed to other programming languages with GC, so I am not sure it is fair to Go.
13 replies →