← Back to context

Comment by agentultra

7 hours ago

I’ve often noted that most projects of a certain size tend to implement some form of garbage collection and allocation.

Perhaps general purpose systems of these sorts aren’t suitable for specialized applications… but I don’t get the “hate” (if you can call it that) which some programmers have for GC.

As someone that likes GCs, I understand it.

GCs have a lot of tradeoffs involved. It's impossible to check all boxes and that means that there's going to be something to gripe about.

If you want your GC to be memory efficient you are likely trading off throughput.

If you want your GC to allocate fast and avoid memory fragmentation, you are likely over-provisioning the heap.

If you want to minimize CPU time in GC, you'll likely increase pause time.

If you want to minimize pause time, you'll likely increase CPU time doing a GC.

All these things can make someone ultimately hate a GC.

However, if you want a programming language which deals with complicated memory lifetime (think concurrent datastructures) then a GC is practically paramount. It's a lot harder to correctly implement something like Java's "ConcurrentHashMap" in C++ or Rust.

For specific use cases, it's not only more ergonomic, it can be more performant. Look at Linux kernel use of RCU. The important thing is to maintain control over the allocation strategy and lifetime of data depending on the use case for systems programming. Defaulting to a GC just removes your control which is the problem. GC itself is not problematic.

> but I don’t get the “hate”

I will put forward some arguments. I do not believe all of them:

GC is for lazy programmers who do not know how to manage memory.

GC takes aspects of memory allocation out of my control. If I control all the things I can get the best performance.

If you care about performance: when your program does not need GC, GC is is pure overhead.

If you care about performance: if you must use GC then you must have a high-performance GC available. So the question is not just GC/no-GC but you have to worry about details of the GC -- it's a leaky abstraction.

If you care about average latency: if you must use GC then you must have a low-latency GC available (i.e. a GC with low and bounded pause times)

If you care about meeting real-time deadlines: if you must use GC then you must have a GC that is guaranteed to meet your timing constraints.

Corollary of the previous three points: GC is not just GC. Someone else's GC is often not the GC you want. Your program requirements can impose strong requirements on the GC algorithm, and you don't always have the necessary control over the GC.

For a particular language, GC algorithm and/or implementation can be a moving target. If the GC developer's goals don't stay aligned with your requirements then you are hosed.

GC results in unpredictable memory allocation performance. Ironing out these performance issues (e.g. by avoiding allocations, pooling objects, etc.) is just as much work as using manual memory allocation, so why bother with GC.

Unless you have allocation patterns or other requirements that really need GC[0], it's easier to just avoid GC.

[0] e.g. some lock-free algorithms depend on the presence of GC

  • My main complaint about GCs is that they only clean up one type of garbage, memory. Maybe you can increase your file descriptor limit and let them deal with some other resources via finalizes. But they aren't tuned for that and there will be some types of resources (locks, temporary files, worker threads, subprocesses, ...) that you need to manage on your own. So now you have at least two forms of resource management in your program.