← Back to context

Comment by johnisgood

8 months ago

I keep hearing that Go's C FFI is slow, why is that? How much slower is it in comparison to other languages?

Go's goroutines aren't plain C threads (blocking syscalls are magically made async), and Go's stack isn't a normal C stack (it's tiny and grown dynamically).

A C function won't know how to behave in Go's runtime environment, so to call a C function Go needs make itself look more like a C program, call the C function, and then restore its magic state.

Other languages like C++, Rust, and Swift are similar enough to C that they can just call C functions directly. CPython is a C program, so it can too. Golang was brave enough to do fundamental things its own way, which isn't quite C-compatible.

  • > CPython is a C program

    Go (gc) was also a C program originally. It still had the same overhead back then as it does now. The implementation language is immaterial. How things are implemented is what is significant. Go (tinygo), being a different implementation, can call C functions as fast as C can.

    > ...so it can too.

    In my experience, the C FFI overhead in CPython is significantly higher than Go (gc). How are you managing to avoid it?

    • I think in case of CPython it's just Python being slow to do anything. There are costs of the interpreter, GIL, and conversion between Python's objects and low-level data representation, but the FFI boundary itself is just a trivial function call.

      1 reply →

  • I wonder if they should be using something like libuv to handle this. Instead of flipping state back and forth, create a playground for the C code that looks more like what it expects.

  • What about languages like Java, or other popular languages with GC?

    • Java FFI is slow and cumbersome, even more so if you're using the fancy auto-async from recent versions. The JVM community has mostly bitten the bullet and rewritten the entire world in Java rather than using native libraries, you only see JNI calls for niche things like high performance linear algebra; IMO that was the right tradeoff but it's also often seen as e.g. the reason why Java GUIs on the desktop suck.

      Other languages generally fall into either camp of having a C-like stack and thread model and easy FFI (e.g. Ruby, TCL, OCaml) and maybe having futures/async but not in an invisible/magic way, or having a radically different threading model at the cost of FFI being slow and painful (e.g. Erlang). JavaScript is kind of special in having C-like stack but being built around calling async functions from a global event loop, so it's technically the first but feels more like the second.

      1 reply →

    • C# does marshal/unmarshal for you, with a certain amount of GC-pinning required for structures while the function is executing. It's pretty convenient, although not frictionless, and I wouldn't like to say how fast it is.

Go's threading model involves a lot of tiny (but growable) stacks and calling C functions almost immediately stack overflows.

Calling C safely is then slow because you have to allocate a larger stack, copy data around and mess with the GC.

> How much slower is it in comparison to other languages?

It's about the same as most other languages that aren't specifically optimized for C calling. Considerably faster than Python.

Which is funny as everyone on HN loves to extol the virtues of Python being a "C DSL" and never think twice about its overhead, but as soon as the word Go is mentioned its like your computer is going to catch fire if you even try.

Emotion-driven development is a bizarre world.

I've asked ChatGPT to summarize (granted my prompt might not be ideal), but some points to note, here just first in details others in the link at the bottom:

     Calling C from Go (or vice versa) often requires switching from Go's lightweight goroutine model to a full OS thread model because:
       - Go's scheduler manages goroutines on M:N threads, but C doesn't cooperate with Go's scheduler.
       - If C code blocks (e.g., on I/O or mutex), Go must assume the worst and parks the thread, spawning another to keep Go alive.
     * Cost: This means entering/exiting cgo is significantly more expensive than a normal Go call. There’s a syscall-like overhead.

... This was only the first issue, but then it follows with "Go runtime can't see inside C to know is it allocating, blocking, spinning, etc.", then "Stack switching", "Thread Affinity and TLS", "Debug/Profiling support overhead", "Memory Ownership and GC barriers"

All here - https://chatgpt.com/share/688172c3-9fa4-800a-9b8f-e1252b57d0...

  • Just to roll with your way: https://chatgpt.com/share/688177c9-ebc0-8011-88cc-9514d8e167...

    Please do not take the numbers below at face value. I still expect an actual reply to my initial comment.

    Per-call overhead:

      C (baseline)    - ~30 ns
      Rust (unsafe)   - ~30 ns
      C# (P/Invoke)   - ~30-50 ns
      LuaJIT          - ~30-50 ns
      Go (cgo)        - ~40-60 ns
      Java (22, FFM)  - ~40-70 ns
      Java (JNI)      - ~300-1000 ns
      Perl (XS)       - ~500-1000 ns
      Python (ctypes) - ~10,000-30,000 ns
      Common Lisp (SBCL) - ~500-1500 ns
    

    Seems like Go is still fast enough as opposed to other programming languages with GC, so I am not sure it is fair to Go.

    • Java now has FFM, that is far better and simpler than JNI, FWIW. and chatgpt says

      Language/API | Call Overhead (no-op C) | Notes

      Go (cgo) | ~40–60 ns | Stack switch + thread pinning

      Java FFM | ~50 ns (downcall) | Similar to JNI, can be ~30 ns with isTrivial()

      Java FFM (leaf) | ~30–40 ns | Optimized (isTrivial=true)

      JNI | ~50–60 ns | Slightly slower than FFM

      Rust (unsafe) | ~5–20 ns | Near-zero overhead

      C# (P/Invoke) | ~20–50 ns | Depends on marshaling

      Python (cffi) | 1000–10000 ns | Orders of magnitude slower |

      4 replies →

    • > Rust (unsafe)

      As if there is an alternative :)

      More seriously, it’s “unsafe” from the perspective of the library calling into C, but usually “safe” for any layer above.

      1 reply →