Comment by hamstergene
18 hours ago
What I didn’t like about this series of books was choosing “garbage collection” as umbrella term for both tracing GC and reference counting, without verifying if programming community would agree with that, which turned out they didn’t.
I’ve seen a lot of threads here and on reddit where people were arguing about terminology purely because of this book alone.
By that definition, C++ code has garbage collection if it uses std::shared_ptr, going against widespread common usage of the term “garbage collected programming language” which specifically contrasts manual languages like C++ or Rust against garbage collected ones.
“Automatic Memory Management” is a lot more suitable description to what programmers have to do to manage memory; it is now in the title but still hasn’t become the primary term.
> What I didn’t like about this series of books was choosing “garbage collection” as umbrella term for both tracing GC and reference counting, without verifying if programming community would agree with that, which turned out they didn’t.
This has been the standard terminology in memory management research for many decades. The only programmers who don't like it are those who don't understand the principles of memory management.
> By that definition, C++ code has garbage collection if it uses std::shared_ptr
That's right.
> going against widespread common usage of the term “garbage collected programming language” which specifically contrasts manual languages like C++ or Rust against garbage collected ones.
Since this contrast mostly exists in the minds of people who don't understand memory management, going against this common misconception is good. That's not to say that there aren't some interesting tradeoffs that often align with the colloquial perception, "garbage collection" isn't the interesting part. As you said, both C++ and Rust use GC; in fact, they use a GC somewhat similar to the one used by CPython.
This reminds me a bit of the way academics in programming language theory internalized the type-theoretic definition of the word “type” over and against the traditional programming definition. You sometimes see people who try to correct the term “dynamically typed language,” which makes perfect sense when types are data types, to “untyped” or “unityped,” which makes sense when types are mathematical constructs equivalent to proofs.
The colloquial term is clear in context, and it draws its boundaries in useful places. If academia prefers other boundaries to simplify its formal definitions, that’s understandable. But the rest of us shouldn’t restrict our language in that way.
It's not about restricting the language. It's that practising programmers often don't know a subject well enough, so they use different words to make distinctions that don't matter as much as they think (see "transpile"). "Dynamically typed" is actually not that big of an offence (because the distinction is real, it's just that the terminology is a bit muddled), and the people in PL theory who are bothered by this (most notably one person) are considered pedants even among their colleagues.
E.g. many practising programmers don't know that tracing moving collectors are used to avoid some of the high overheads associated with memory allocators (malloc/free), which are themselves big and complex beasts that make up substantial "runtimes" (another misused and misleading word).
I think GC's definition is pretty clear cut. How is counting references to determine when a lifetime ends materially different from another way of doing the same thing? Like there is even a paper that shows that one is tracking liveness, while the other tracks "deadness" and they are literally going at the same thing from different ends.
If anything, I often see a bias against tracing GCs from the people misusing the term, to "hype up" their choice of language that it must be better for not having (tracing) GC, when it usually just has ref counting which in many metrics is actually worse, given equal usage -- rust/cpp gets away from that because they only use it on a handful of objects, other lifetimes being driven by RAII, which is pretty much just compile-time decidable ref counting?
13 replies →
I've always considered shared_ptr to be semi-garbage collection. Allows me to code C++ almost as if it were Java so long as circular references are avoided. I'm perfectly fine with it being considered a type of garbage collection.
> so long as circular references are avoided
And there's always weak_ptr if a cycle makes sense for some reason but you still want it to clean up correctly. Like having a child node point up to a parent or the root in a tree structure.
The Linux kernel has garbage collection, and not just the controversial refcount kind.
I'll go further. Linux heavily uses a form of garbage collection that cannot be implemented in typical userspace (without awkward & slower additions to the consistency algorithm).
https://en.wikipedia.org/wiki/Read-copy-update
Instead of "garbage collection", you can say "dynamic lifetime determination". If code does work at runtime to answer the question "is it safe to free this piece of memory?", that's dynamic lifetime determination, and is a property shared by both reference-counting and more sophisticated GC schemes.