I also wish there was a book that guided you through the process of implementing a language with accurate garbage collection, similar to how Crafting Interpreters teaches you to implement a language. Perhaps it could start with a shadow stack + simple mark-and-sweep and then move on to stack maps + generational GC.
In my opinion, the "manual" memory management, introduced by the IBM PL/I programming language by the end of 1964, and inherited by C and other languages, i.e. where the programmer is responsible for invoking "free", was a serious mistake and it was an obsolete technique already at the date of its introduction.
When the explicit "free" was invented, automatic memory reclamation while avoiding the non-determinism of garbage collectors had already been known for 4 years, since 1960, when another IBM employee had invented reference counting (as a reaction to the garbage collector of LISP I).
When implemented naively, reference counting has some disadvantages, but those can be circumvented relatively easily in an optimized implementation. The book discussed in the parent article also has a chapter about reference counting.
I have written C programs for many decades, but I have never invoked "free" directly, because I have always used reference counts. I have never encountered a circumstance when I would have wanted to invoke "free" directly.
C has the disadvantage that the compiler will not do implicitly things like virtual function invocation, reference counts handling etc. but any such techniques that are provided by higher-level languages can still be used in a language like C, even if they require more boilerplate code.
I do not like the "shared_ptr" implementation of reference counting in C++, because that data type is not directly usable in places where a plain reference or pointer is expected. Implementations that do not have this problem exist.
I too have written C programs for decades. I encountered reference counting when I learned how to write Windows kernel drivers. It was very liberating to see that there were so fewer opportunities for memory leaks when reference counts were applied liberally.
GC's strength is not only in the ease of writing but at also reading, since you don't need to interleave allocation and business logic everywhere (be it through types or imperative code).
You mean "interleave deallocation and business logic everywhere".
For allocation there is no difference between automatic memory management with garbage collectors or reference counts and manual memory management, where the programmer is responsible for invoking "free".
These alternative memory management methods differ only in how deallocation is handled.
Allocation must always be done by defining a new object, regardless of how memory is managed. Moreover, allocation also does not depend on whether an object is allocated in static storage, in a stack or in a heap. You always must define the object, so that memory should be allocated for it at compile-time if in static storage, or at run-time if in a stack or in a heap.
I have never seen Codex or Claude get manual memory management wrong. I used to be pretty fastidious about using leak sanitizer or other such tools to catch my own memory management issues, and while not quite useless, that sort of testing has dropped way down my list of worries the more I lean on LLMs. I am constantly surprised by how many formerly tedious or error prone tasks stopped being either of those, and I expect to see practice shift away from middle-safe languages like C++ to not just much more safe languages like Rust but surprisingly also to much less safe ones like C and platform specific assembly.
The hard part was never getting it correct on a local scope, that's mostly solved by a linter, or even C++'s RAII will get it right.
The hard part is doing it correctly on a global scope with non-trivial lifetimes, possibly influenced by multiple threads.
And in my experience LLMs are still hit or miss on these kind of problems, they can find problems from time to time, but they can't really reason well about more complex global state reliably. They will come up with "hypotheses" that 'oh sure this is the root cause of the issue' only to say something completely wrong (which you may notice or not, only to fail later)
What I didn’t like about this series of books was choosing “garbage collection” as umbrella term for both tracing GC and reference counting, without verifying if programming community would agree with that, which turned out they didn’t.
I’ve seen a lot of threads here and on reddit where people were arguing about terminology purely because of this book alone.
By that definition, C++ code has garbage collection if it uses std::shared_ptr, going against widespread common usage of the term “garbage collected programming language” which specifically contrasts manual languages like C++ or Rust against garbage collected ones.
“Automatic Memory Management” is a lot more suitable description to what programmers have to do to manage memory; it is now in the title but still hasn’t become the primary term.
> What I didn’t like about this series of books was choosing “garbage collection” as umbrella term for both tracing GC and reference counting, without verifying if programming community would agree with that, which turned out they didn’t.
This has been the standard terminology in memory management research for many decades. The only programmers who don't like it are those who don't understand the principles of memory management.
> By that definition, C++ code has garbage collection if it uses std::shared_ptr
That's right.
> going against widespread common usage of the term “garbage collected programming language” which specifically contrasts manual languages like C++ or Rust against garbage collected ones.
Since this contrast mostly exists in the minds of people who don't understand memory management, going against this common misconception is good. That's not to say that there aren't some interesting tradeoffs that often align with the colloquial perception, "garbage collection" isn't the interesting part. As you said, both C++ and Rust use GC; in fact, they use a GC somewhat similar to the one used by CPython.
This reminds me a bit of the way academics in programming language theory internalized the type-theoretic definition of the word “type” over and against the traditional programming definition. You sometimes see people who try to correct the term “dynamically typed language,” which makes perfect sense when types are data types, to “untyped” or “unityped,” which makes sense when types are mathematical constructs equivalent to proofs.
The colloquial term is clear in context, and it draws its boundaries in useful places. If academia prefers other boundaries to simplify its formal definitions, that’s understandable. But the rest of us shouldn’t restrict our language in that way.
I've always considered shared_ptr to be semi-garbage collection. Allows me to code C++ almost as if it were Java so long as circular references are avoided. I'm perfectly fine with it being considered a type of garbage collection.
And there's always weak_ptr if a cycle makes sense for some reason but you still want it to clean up correctly. Like having a child node point up to a parent or the root in a tree structure.
Instead of "garbage collection", you can say "dynamic lifetime determination". If code does work at runtime to answer the question "is it safe to free this piece of memory?", that's dynamic lifetime determination, and is a property shared by both reference-counting and more sophisticated GC schemes.
I have this book and like it.
I also wish there was a book that guided you through the process of implementing a language with accurate garbage collection, similar to how Crafting Interpreters teaches you to implement a language. Perhaps it could start with a shadow stack + simple mark-and-sweep and then move on to stack maps + generational GC.
I had the 2012 print edition. One of the best books available - the best book that I knew - about GC at that time.
Anti-pattern: Regarding the 2023 e-book edition, I do not see a way to buy it from the site, or even a link to buy.
I thought I was missing something but figured they just didn't have a link to purchase it. I ended up going to the publisher's site to track it down:
https://www.routledge.com/The-Garbage-Collection-Handbook-Th...
How on earth is the ebook more expensive than the physical copy!
5 replies →
I have the 2012 print edition too and fully concur with the assessment. The best book about garbage collection at the time (and maybe still?)
I remember reading it before. My son threw it away when we moved houses, not knowing how important it was. I'd recommend it.
Ironic that this of all books got garbage collected prematurely.
Q: what kind of collection is this real world example illustrating?
A: Copying Garbage Collector (semi space). Chapter 4!
Great book. I was always fascinated by bakers treadmill. Always wanted a real world case where I could implement one with Fibonacci sized mills.
[dead]
Previously:
Dec 2025 https://news.ycombinator.com/item?id=35492307
[dead]
How good are AIs at coding manual memory management? Is this a sea change in automatic memory management?
In my opinion, the "manual" memory management, introduced by the IBM PL/I programming language by the end of 1964, and inherited by C and other languages, i.e. where the programmer is responsible for invoking "free", was a serious mistake and it was an obsolete technique already at the date of its introduction.
When the explicit "free" was invented, automatic memory reclamation while avoiding the non-determinism of garbage collectors had already been known for 4 years, since 1960, when another IBM employee had invented reference counting (as a reaction to the garbage collector of LISP I).
When implemented naively, reference counting has some disadvantages, but those can be circumvented relatively easily in an optimized implementation. The book discussed in the parent article also has a chapter about reference counting.
I have written C programs for many decades, but I have never invoked "free" directly, because I have always used reference counts. I have never encountered a circumstance when I would have wanted to invoke "free" directly.
C has the disadvantage that the compiler will not do implicitly things like virtual function invocation, reference counts handling etc. but any such techniques that are provided by higher-level languages can still be used in a language like C, even if they require more boilerplate code.
I do not like the "shared_ptr" implementation of reference counting in C++, because that data type is not directly usable in places where a plain reference or pointer is expected. Implementations that do not have this problem exist.
I too have written C programs for decades. I encountered reference counting when I learned how to write Windows kernel drivers. It was very liberating to see that there were so fewer opportunities for memory leaks when reference counts were applied liberally.
GC's strength is not only in the ease of writing but at also reading, since you don't need to interleave allocation and business logic everywhere (be it through types or imperative code).
GC simply is the only way to approach the clarity of pseudo-code in real code. That's one of my later realizations concerning the subject (https://world-playground-deceit.net/blog/2024/11/how-i-learn...)
You mean "interleave deallocation and business logic everywhere".
For allocation there is no difference between automatic memory management with garbage collectors or reference counts and manual memory management, where the programmer is responsible for invoking "free".
These alternative memory management methods differ only in how deallocation is handled.
Allocation must always be done by defining a new object, regardless of how memory is managed. Moreover, allocation also does not depend on whether an object is allocated in static storage, in a stack or in a heap. You always must define the object, so that memory should be allocated for it at compile-time if in static storage, or at run-time if in a stack or in a heap.
2 replies →
I have never seen Codex or Claude get manual memory management wrong. I used to be pretty fastidious about using leak sanitizer or other such tools to catch my own memory management issues, and while not quite useless, that sort of testing has dropped way down my list of worries the more I lean on LLMs. I am constantly surprised by how many formerly tedious or error prone tasks stopped being either of those, and I expect to see practice shift away from middle-safe languages like C++ to not just much more safe languages like Rust but surprisingly also to much less safe ones like C and platform specific assembly.
The hard part was never getting it correct on a local scope, that's mostly solved by a linter, or even C++'s RAII will get it right.
The hard part is doing it correctly on a global scope with non-trivial lifetimes, possibly influenced by multiple threads.
And in my experience LLMs are still hit or miss on these kind of problems, they can find problems from time to time, but they can't really reason well about more complex global state reliably. They will come up with "hypotheses" that 'oh sure this is the root cause of the issue' only to say something completely wrong (which you may notice or not, only to fail later)
In Zig, pretty good, but not perfect yet
What I didn’t like about this series of books was choosing “garbage collection” as umbrella term for both tracing GC and reference counting, without verifying if programming community would agree with that, which turned out they didn’t.
I’ve seen a lot of threads here and on reddit where people were arguing about terminology purely because of this book alone.
By that definition, C++ code has garbage collection if it uses std::shared_ptr, going against widespread common usage of the term “garbage collected programming language” which specifically contrasts manual languages like C++ or Rust against garbage collected ones.
“Automatic Memory Management” is a lot more suitable description to what programmers have to do to manage memory; it is now in the title but still hasn’t become the primary term.
> What I didn’t like about this series of books was choosing “garbage collection” as umbrella term for both tracing GC and reference counting, without verifying if programming community would agree with that, which turned out they didn’t.
This has been the standard terminology in memory management research for many decades. The only programmers who don't like it are those who don't understand the principles of memory management.
> By that definition, C++ code has garbage collection if it uses std::shared_ptr
That's right.
> going against widespread common usage of the term “garbage collected programming language” which specifically contrasts manual languages like C++ or Rust against garbage collected ones.
Since this contrast mostly exists in the minds of people who don't understand memory management, going against this common misconception is good. That's not to say that there aren't some interesting tradeoffs that often align with the colloquial perception, "garbage collection" isn't the interesting part. As you said, both C++ and Rust use GC; in fact, they use a GC somewhat similar to the one used by CPython.
This reminds me a bit of the way academics in programming language theory internalized the type-theoretic definition of the word “type” over and against the traditional programming definition. You sometimes see people who try to correct the term “dynamically typed language,” which makes perfect sense when types are data types, to “untyped” or “unityped,” which makes sense when types are mathematical constructs equivalent to proofs.
The colloquial term is clear in context, and it draws its boundaries in useful places. If academia prefers other boundaries to simplify its formal definitions, that’s understandable. But the rest of us shouldn’t restrict our language in that way.
14 replies →
I've always considered shared_ptr to be semi-garbage collection. Allows me to code C++ almost as if it were Java so long as circular references are avoided. I'm perfectly fine with it being considered a type of garbage collection.
> so long as circular references are avoided
And there's always weak_ptr if a cycle makes sense for some reason but you still want it to clean up correctly. Like having a child node point up to a parent or the root in a tree structure.
The Linux kernel has garbage collection, and not just the controversial refcount kind.
Instead of "garbage collection", you can say "dynamic lifetime determination". If code does work at runtime to answer the question "is it safe to free this piece of memory?", that's dynamic lifetime determination, and is a property shared by both reference-counting and more sophisticated GC schemes.