Comment by pron
17 hours ago
> What I didn’t like about this series of books was choosing “garbage collection” as umbrella term for both tracing GC and reference counting, without verifying if programming community would agree with that, which turned out they didn’t.
This has been the standard terminology in memory management research for many decades. The only programmers who don't like it are those who don't understand the principles of memory management.
> By that definition, C++ code has garbage collection if it uses std::shared_ptr
That's right.
> going against widespread common usage of the term “garbage collected programming language” which specifically contrasts manual languages like C++ or Rust against garbage collected ones.
Since this contrast mostly exists in the minds of people who don't understand memory management, going against this common misconception is good. That's not to say that there aren't some interesting tradeoffs that often align with the colloquial perception, "garbage collection" isn't the interesting part. As you said, both C++ and Rust use GC; in fact, they use a GC somewhat similar to the one used by CPython.
This reminds me a bit of the way academics in programming language theory internalized the type-theoretic definition of the word “type” over and against the traditional programming definition. You sometimes see people who try to correct the term “dynamically typed language,” which makes perfect sense when types are data types, to “untyped” or “unityped,” which makes sense when types are mathematical constructs equivalent to proofs.
The colloquial term is clear in context, and it draws its boundaries in useful places. If academia prefers other boundaries to simplify its formal definitions, that’s understandable. But the rest of us shouldn’t restrict our language in that way.
It's not about restricting the language. It's that practising programmers often don't know a subject well enough, so they use different words to make distinctions that don't matter as much as they think (see "transpile"). "Dynamically typed" is actually not that big of an offence (because the distinction is real, it's just that the terminology is a bit muddled), and the people in PL theory who are bothered by this (most notably one person) are considered pedants even among their colleagues.
E.g. many practising programmers don't know that tracing moving collectors are used to avoid some of the high overheads associated with memory allocators (malloc/free), which are themselves big and complex beasts that make up substantial "runtimes" (another misused and misleading word).
I think GC's definition is pretty clear cut. How is counting references to determine when a lifetime ends materially different from another way of doing the same thing? Like there is even a paper that shows that one is tracking liveness, while the other tracks "deadness" and they are literally going at the same thing from different ends.
If anything, I often see a bias against tracing GCs from the people misusing the term, to "hype up" their choice of language that it must be better for not having (tracing) GC, when it usually just has ref counting which in many metrics is actually worse, given equal usage -- rust/cpp gets away from that because they only use it on a handful of objects, other lifetimes being driven by RAII, which is pretty much just compile-time decidable ref counting?
Right, and there are differences within tracing GCs that are just as big as between refcounting (and even manual malloc/free) and tracing. For example, Go uses tracing to determine when an object lifetime ends. But the moving collectors in Java, .NET, and V8 don't know and don't care when objects die, and they have no "free" operation at all. In many ways, the performance profile (of favouring smaller footprint or higher throughput) of memory management in C++, Rust, Python, and Go share more similarities among themselves than Java, .NET, V8, and Zig, which also share a more similar profile (arenas, like moving collectors, don't need or want to know when an object's lifetime ends).
Another distinction without a difference that is really just giving a name to a misconception is the notion of "a runtime". When I learnt C in the late 80s or early 90s, the book said something like, "C is not just the language, but a rich runtime". Indeed, modern malloc/free implementations mean that a C program ends up needing a larger and more elaborate runtime than a program in some educational language that uses a trivial implementation of a mark-and-sweep collector. Modern malloc/free allocators also sometimes come with an impressive set of tuning knobs. It's just that people who haven't had a lot of experience writing large programs in low-level languages don't know about them (or they just work to avoid allocations as much as possible, because that's what they've been told to do).
> Like there is even a paper that shows that one is tracking liveness, while the other tracks "deadness" and they are literally going at the same thing from different ends.
https://dl.acm.org/doi/10.1145/1035292.1028982
I think a lot of people just want to be able to discuss different areas of the automatic memory management design space separately, and maintaining the distinction between reference counting and garbage collection (meaning tracing GCs) lets them do that.
As for me personally, I consider refcounting and GC overlapping categories. I am perfectly willing to call CPython’s reference counting plus cycle collector a form of garbage collection, because it is transparent to the programmer. Every memory management technique has tradeoffs and pathological edge cases, but since you don’t have to consider them in the ordinary course of programming I’d say it counts. If you had to break cycles manually, or to annotate which references should be counted, I’d call that refcounting but not GC – as in the C++ stdlib.
4 replies →
By that definition even C has garbage collection. Automatic storage duration types have compiler-determined lifetime and automatic deallocation.
If the definition of a word/concept does not match how the word is used in real life, the definition is wrong. After all, semantics is about common understanding of concepts. If your definition of a word doesn't match how it's used, using that definition is not beneficial to use.
5 replies →