Comment by gf000
15 hours ago
I think GC's definition is pretty clear cut. How is counting references to determine when a lifetime ends materially different from another way of doing the same thing? Like there is even a paper that shows that one is tracking liveness, while the other tracks "deadness" and they are literally going at the same thing from different ends.
If anything, I often see a bias against tracing GCs from the people misusing the term, to "hype up" their choice of language that it must be better for not having (tracing) GC, when it usually just has ref counting which in many metrics is actually worse, given equal usage -- rust/cpp gets away from that because they only use it on a handful of objects, other lifetimes being driven by RAII, which is pretty much just compile-time decidable ref counting?
Right, and there are differences within tracing GCs that are just as big as between refcounting (and even manual malloc/free) and tracing. For example, Go uses tracing to determine when an object lifetime ends. But the moving collectors in Java, .NET, and V8 don't know and don't care when objects die, and they have no "free" operation at all. In many ways, the performance profile (of favouring smaller footprint or higher throughput) of memory management in C++, Rust, Python, and Go share more similarities among themselves than Java, .NET, V8, and Zig, which also share a more similar profile (arenas, like moving collectors, don't need or want to know when an object's lifetime ends).
Another distinction without a difference that is really just giving a name to a misconception is the notion of "a runtime". When I learnt C in the late 80s or early 90s, the book said something like, "C is not just the language, but a rich runtime". Indeed, modern malloc/free implementations mean that a C program ends up needing a larger and more elaborate runtime than a program in some educational language that uses a trivial implementation of a mark-and-sweep collector. Modern malloc/free allocators also sometimes come with an impressive set of tuning knobs. It's just that people who haven't had a lot of experience writing large programs in low-level languages don't know about them (or they just work to avoid allocations as much as possible, because that's what they've been told to do).
> Like there is even a paper that shows that one is tracking liveness, while the other tracks "deadness" and they are literally going at the same thing from different ends.
https://dl.acm.org/doi/10.1145/1035292.1028982
I think a lot of people just want to be able to discuss different areas of the automatic memory management design space separately, and maintaining the distinction between reference counting and garbage collection (meaning tracing GCs) lets them do that.
As for me personally, I consider refcounting and GC overlapping categories. I am perfectly willing to call CPython’s reference counting plus cycle collector a form of garbage collection, because it is transparent to the programmer. Every memory management technique has tradeoffs and pathological edge cases, but since you don’t have to consider them in the ordinary course of programming I’d say it counts. If you had to break cycles manually, or to annotate which references should be counted, I’d call that refcounting but not GC – as in the C++ stdlib.
> I think a lot of people just want to be able to discuss different areas of the automatic memory management design space separately, and maintaining the distinction between reference counting and garbage collection (meaning tracing GCs) lets them do that.
The problem is that there are many differences in memory management techniques that offer different tradeoffs, and the difference between refcounting and tracing is not necessarily the biggest of them.
For example, one of the most important distinctions in memory management is whether it optimises for footprint or speed (or some compromise), and the line isn't where people who don't understand memory management think it is. It can matter (often a great deal) whether you determine that an object is dead dynamically (say, by counting references) or statically (by manually writing free or by having the language track lifetimes), but it doesn't matter as much as whether or not the mechanism needs to know when objects are dead in the first place. So reference counting, manual free, static lifetimes, and even non-moving mark-and-sweep tracing collectors (like Go's) generally optimise for footprint at the expense of speed (although different allocators can have some control over that tradeoff), while arenas and tracing moving collectors optimise for speed at the expense of footprint (although here, too, they have some control over the tradeoff). So the line for this super-important tradeoff is between [manual, static, refcoutning] and [arenas, moving tracing]; non-moving tracing collectors are somewhere in between but may be closer to the first group.
People who don't understand memory management and may not have a lot of experience in low-level programming sometimes think that manual or statically-determined freeing must be fast because low-level languages, which inexperienced people think are fast, use them. In fact, low-level languages have some concerns that are much more important than speed and that preclude them from optimisations such as moving pointers. To get around that performance handicap, these languages try to avoid using their heap memory management as much as possible because they're using a rather slow technique because of their constraints.
"Speed" is also ambiguous between latency and throughput. You seem to be using "speed" here as a synonym for throughput. Because of Little's Law, the memory consumed by deallocated objects is directly proportional to deallocation latency, so "low footprint" also generally means "low latency", while increasing throughput by amortizing deallocation overhead at the expense of latency increases memory usage for the same reason.
1 reply →
I don't really disagree much with what you said. My favored PLang Nim (https://nim-lang.org/ -- it has both `ref` and `ptr` styles of pointer, one auto-managed, one manually managed) even changed a while back it's `nim c --gc=x` command-line language to `nim c --mm=x`, and I was in favor of said change.
However, it does inspire me to write.. The kernel of all this terminology confusion is under-exposure of industrial programmers to not just academic terminology, but also the very design space you mention (which has always been nicely covered by Jones' outstanding book). Just to take an example from the root of this thread:
>widespread common usage of the term “garbage collected programming language” which specifically contrasts manual languages like C++ or Rust against garbage collected ones
Boehm-Wiser conservative collection for C, among the most manual languages of all, pre-dates its very first ANSI 1989 standard.
This underexposure itself is downstream of the kinds of oversimplifications/lies of marketing and in this particular case came from Java. The evolution I witnessed was roughly 1) linking Boehm with -lgc and deleting (or #define'ing away) all your `free()` calls is conservative - to be precise you need compiler aid and a lot of programmers are "not perfect==awful" personality types, 2) Sun Microsystems wants to leverage a lot of reliability issues with C code and become The Platform and spends gobs of money to win hearts & minds, partly succeeding, 3) part of its ad-warfare against the then WIntel hegemony and/or tutorials/introductory material for Junior Programmers (often the target of "be more reliable" material) plays fast & loose with GC terminology because marketing plays fast & loose structurally for fun but mostly profit, 4) because human language really does == language usage a la Quine, everyone in the industry re-defines what "GC" means to bind it to a programming language instead of to a specific run-time, 5) industry & academics use different language, confusion ensues and so here we are.
This is not even the 100th time that either explicit or implicit forces of marketing have achieved confusion analogously to this. If you believe most people don't need much of what they spend on then confusion is arguably intrinsic to marketing of ideas/products. The highly misleading but suggestive metaphorical language used all over "AI" in both research and in product-lines is a more current case of this, leading anyone who knows much to have to qualify "not AGI" or other such junk just to have a conversation.
So, what is my point? Basically just that the larger problem here will persist as long as there is money to be made/attention to be garnered by sowing confusion/having people talk past each other/think some product is more than it really is. I have no meta-strategy in my back pocket to block these successful confusions, but it does seem worth being aware of it.
By that definition even C has garbage collection. Automatic storage duration types have compiler-determined lifetime and automatic deallocation.
If the definition of a word/concept does not match how the word is used in real life, the definition is wrong. After all, semantics is about common understanding of concepts. If your definition of a word doesn't match how it's used, using that definition is not beneficial to use.
Well yeah, stack variables are automatically reclaimed. What's your issue?
It's just that this is not the predominant way C programs are written and for everything else you do need to somehow manage the memory, malloced objects would otherwise just leak. What exactly is the issue, the real life use of C requires manually adding free calls, is it not? So it doesn't do automatic memory management for you.
Gilad Bracha wrote a fascinating piece on how tail call optimization could be implemented with periodic collection instead of immediately reusing the call frame, it's a fascinating piece: https://gbracha.blogspot.com/2009/12/chased-by-ones-own-tail...
The term "garbage collection" does not mean that the language has some mechanism of automatically reclaiming some memory. If it did, C would be a garbage-collected language. The term is not used in such way.
Now, of course reference counting can be used as a part of a garbage collector. But that doesn't mean any language that allows you to implement reference counting as a library, is a garbage-collected language.
3 replies →