The Garbage Collection Handbook, 2nd Edition

3 years ago (routledge.com)

173 comments

zzxcvb

On a related note, I found ART's implementation of Concurrent copying and compaction GC to be pretty novel. Here's a nice write up detailing how handling page faults in userspace come in handy to accomplish that: https://www.tdcommons.org/cgi/viewcontent.cgi?article=4751&c... (pdf) / https://web.archive.org/web/20230216233459/https://www.tdcom...

For context, here's a brief overview of the evolution of the Android Runtime Garbage Collector: https://archive.is/Ue6Pj

NobleExpress 3 years ago
Certainly interesting, but there are no performance numbers mentioned in the white paper comparing userfaultfd to read barriers. So the actual benefit to switching to userfaultfd is unknown (at least in publically accessible documents -- I'm sure Google has done internal performance evaluations).
Using page faults (and/or page protection) to perform compaction instead of barriers is a pretty old technique (see the 1988 paper by Appel, Ellis, and Li [1]; see also the Compressor by Kermany and Petrank [2]). But handling page faults were very expensive (at least on Linux) until the addition of userfaultfd recently.
[1]: https://dl.acm.org/doi/10.1145/960116.53992 [2]: https://dl.acm.org/doi/10.1145/1133255.1134023
- anonymousDan 3 years ago
  
  What is the actual perf benefit of userfaultd (e.g. for write protection)? It sounds interesting but its unclear to me why it would be any faster than a signal handler. Is it just a simpler code path within the kernel? Or is it that the hardware is configured to directly jump to a user space handler without any kernel intervention?
  
  1 reply →

spapas82 3 years ago

20 years ago in the programming languages lesson at the university we started learning about OOP using Java and Ada as examples. When the professor started describing Java, a fellow student interrupted him to inform the class that "Java has garbage collection" (and boast of his special knowledge).

After that incident his nickname was "the garbage collector"!

ajdude 3 years ago
Coincidentally, Ada optionally supports garbage collection in its specifications but it's up to the runtime to implement it.
- pjmlp 3 years ago
  
  And since no implementation has ever supported it, it has been deprecated in ISO Ada 95 and further removed from it on ISO Ada 2012.
  https://en.wikibooks.org/wiki/Ada_Programming/Pragmas/Contro...
  
  51 replies →
- mcculley 3 years ago
  
  Which means no library can depend on it, pushing memory management responsibilities into calling code.
- polynomial 3 years ago
  
  Ah, same as C++ it sounds like.
chinchilla2020 3 years ago

Plot Twist:
20 years later, he wrote "The Garbage Collection Handbook" and is the leading authority on garbage collection.
seanp2k2 3 years ago

Stop-the-world indeed

_a_a_a_ 3 years ago

Met Richard Jones, one of the authors of the book, at its original launch. Very nice guy. Bought the original book and did some heavy reading on it – if you get this book, be prepared to put time into it, it's readable but it's not one you can do a quick skim of – and I really profited from it. It should be required reading for all programmers.

(To be clear, I'm referring to the very first version, I haven't read the subsequent versions but given the quality of the first I'd be very surprised if they were any less good).

Edit: https://www.cs.kent.ac.uk/people/staff/rej/ - a jumpoff page for his stuff.

Edit2: https://www.cs.kent.ac.uk/people/staff/rej/gcbib/ - "[This] online bibliographic database includes nearly 3,000 garbage collection-related publications. It contains abstracts for some entries and URLs or DOIs for most of the electronically available ones, and is continually being updated. The database can be searched online or downloaded as BibTeX, PostScript, or PDF." Welcome to the ultimate rabbit hole I guess.

sirwhinesalot 3 years ago

I don't know if it is included in the new edition of the book but in case anyone is interested in a modern, highly efficient RC implementation that does not rely on deferring the reference count updates (which kills one of the advantages of RC in the first place), check the Perseus paper. Koka (which uses perseus) is quite competitive with OCaml and Haskell

Just search for "perseus reference counting", you'll find it. It uses linear logic to insert explicit "dup/drop" operations and then merges and coalesces them.

throw10920 3 years ago
Is there any particular reason you didn't link to the paper itself (https://www.microsoft.com/en-us/research/uploads/prod/2020/1...) or use Microsoft's scuffed version of DOI (MSR-TR-2020-42)? The paper appears to be freely available, so there shouldn't be copyright issues with linking to it...
- sirwhinesalot 3 years ago
  
  Just that I'm on mobile and am lazy. Thanks for posting it!
jimsimmons 3 years ago
Apart from the paper are there independent verifications of the technique's performance?
- sirwhinesalot 3 years ago
  
  The Roc language uses it too and shows similar performance. It's not magic though, it just takes advantage of RC for various optimizations (like "Functional But In Place") which add up and compensate for the added burden of ref count manipulation.

t-3 3 years ago

I found "A Unified Theory of Garbage Collection" helpful: https://dl.acm.org/doi/10.1145/1035292.1028982

samsquire 3 years ago

if you need code to understand garbage collection, there is walkthrough of garbage collector and C code at http://maplant.com/gc.html is really helpful.

I tweaked it to work on amd64 and started adding register scanning based on what eatonphil's discord people told me to do.

https://github.com/samsquire/garbage-collector

It's not fit for any purpose but more of a learning exercise.

mananaysiempre 3 years ago
Bob Nystrom (of Game Programming Patterns, Crafting Interpreters, and dartfmt fame) also wrote a tutorial implementation[1], of a precise tracing GC as opposed to a conservative one.
Regarding register scanning in a conservative GC, Andreas Kling has made (or at least quoted) the amusing observation[2] that your C runtime already has a primitive to dump all callee-save registers to memory: setjmp(). So all you have to do to scan both registers and stack is to put a jmp_buf onto the stack, setjmp() to it, then scan the stack normally starting from its address.
[1] https://journal.stuffwithstuff.com/2013/12/08/babys-first-ga...
[2] https://youtu.be/IzB6iTeo8kk
- jlokier 3 years ago
  
  Glibc's mangles some pointer registers in setjmp(). It XORs a per-process value with the stack pointer, frame pointer and instruction pointer stored in the jmp_buf, on all (or nearly all) architectures.
  Although losing the stack and instruction pointers is unlikely to be a problem for the GC context, the frame pointer register need not contain a frame pointer value. It can be an arbitrary program value depending on compile options. That's something to watch out for with this GC technique.
  
  1 reply →
- sirwhinesalot 3 years ago
  
  Implementations are unfortunately allowed to do whatever they want to that jmp_buf, they could xor the contents for all you know. Hopefully no implementation does something silly like that.
  
  2 replies →
juunpp 3 years ago
Side topic, but the maplant website design is great.
- culi 3 years ago
  
  Besides the code blocks, it has 0 CSS. So what you're actually complementing is your browser's default styles for HTML

Dwedit 3 years ago

What I really want out of a garbage collector is a "Collect" function with a deadline. Pick a max time it's allowed to run before stopping and returning to the program.

NobleExpress 3 years ago
Real time GCs exist such as the IBM Metronome GC. Though I'll be honest and say I haven't heard of many real-time GCs other than the Metronome one. Certainly many new GCs have reduced pause times dramatically but that's orthogonal to real-time GC (as you can make pause times infinitesimally small but not let the mutator actually progress).
- mananaysiempre 3 years ago
  
  Here’s a fairly extensive review that mentions some recent research on “real-time” GCs, including one with a lower bound for mutator utilization: https://eschew.wordpress.com/2016/09/02/summarizing-gc/.
- pjmlp 3 years ago
  
  See PTC and Aicas.
bob1029 3 years ago

I've achieved GC timing that is good enough for real-time competitive game hosting using .NET6+. Worst case over 4 hours of load testing was an 8ms spike. Average 1-2ms.
The magic trick is to intentionally collect as often as reasonably possible (i.e. at batch/frame/tick processing boundaries) and avoid using sophisticated GC schemes that involve multiple threads or asynchrony.
Oh, and obviously you need to minimize allocations throughout or it won't matter.
auxym 3 years ago

Nim has exactly that with the GC_step proc.
https://nim-lang.org/1.4.0/gc.html
However recent and future versions (2.0) are moving towards a different approach that is also applicable for deterministic real time systems: ARC, which is basically alloc and free calls inserted automatically by the compiler using static analysis (no "runtime").
grumpy_coder 3 years ago

That is necessary but often not sufficient. It still leaves the problem of moving time from where garbage is created to the bulk 'collect' step. I've run collect every frame to keep stalls down, but the time always creeps up and up and you end up doing forensic analysis of which code is making the garbage in the first place.
bjourne 3 years ago

Not really possible unless you have a real-time OS. Virtual memory and concurrency means there is really no way to bound how long anything will take.
pebal 3 years ago

You may have absolutely zero pauses if you don't have heap compaction.
circuit10 3 years ago

Isn’t that just incremental garbage collection?

sadiq 3 years ago

Great that there's a new edition of this coming. This is the definitive reference if you work with garbage collectors, it's also very well written.

mananaysiempre 3 years ago

Note that (the first edition of) this book considers reference counting, including variations like deferred reference counting, to be garbage collection (though not tracing garbage collection), and reviews it as well.

pjmlp 3 years ago

Just like most relevant computer science sources do, only joe and jane on the street without CS background think otherwise.

cwzwarich 3 years ago

Anyone know what's new in this edition? Both the publisher's site and the author's site are light on details.

pjmlp 3 years ago

Great that is having an update, it is one of the most relevant source of informations for GC algorithms.

throwaway81523 3 years ago

Wow, this really is a new edition that will be available in 2023. The old (2012) edition is excellent. I don't have the impression that a whole lot has changed since then, but an update with all the latest refinements has to be worthwhile to implementers.

m3kw9 3 years ago

Who would usually need this book?

rwmj 3 years ago

The first edition is a definitive reference for anyone writing a garbage collector (and by extension, exploring writing a programming language). Also people who are interested in how GCs work.
Probably less so for people trying to optimize a program to work with an existing GC (eg. tweaking the JVM), but I suppose knowing the basic principles can help.
whartung 3 years ago

I’ve written two garbage collectors in my time, neither of which were in a programming language.
One was for an in memory cache of data relationships. Another was to clean up soft references with an RDF graph. Neither were, nor needed to be, particularly sophisticated.
The cache was a compacting collector, the RDF one was mostly a “connectedness” test, pruning those nodes fallen from the graph.
Recall that malloc has a simple garbage collector for its free space, and arguably the level of sophistication that ranks a modern malloc implementation is how it manages its free space.
In the end detritus must be identified and resources reclaimed. So you see how GC like systems can occur in divergent areas of work.
parenthesis 3 years ago

This book (the 1st edition) gave me exactly what I needed when writing the garbage collector for my programming language.
Besides the great technical content, I found it to be a very enjoyable, readable book.
mkl95 3 years ago

I read the mark-and-sweep GC parts to gain some deeper understanding about Go's garbage collector. Probably the best resource for that kind of stuff other than reading actual code.

einpoklum 3 years ago

I feel the need for garbage collection is a language design mis-feature. That is to say, producing garbage is a language design-mis-feature. To quote Bjarne Stroustrup:

> I don't like garbage. I don't like littering. My ideal is to eliminate the > need for a garbage collector by not producing any garbage. That is now > possible.

and it's indeed possible. For example It's become pretty much a non-issue in modern C++: https://stackoverflow.com/a/48046118/1593077 (and C++ is not the only example, it's just a prominent example of a language which almost standardized garbage collection, but eventually did not go that way.)

sirwhinesalot 3 years ago
C++ can actually produce quite a bit of garbage unintentionally, it's why linters will remind you often to call std::move.
That said I much prefer deterministic resource cleanup even in a janky language like C++ over a tracing GC.
- einpoklum 3 years ago
  
  It's true that C++ can trigger a lot of unnecessary copying if one writes carelessly; but that's not the same as garbage, in that both copies are used, and none of them continue living indefinitely without special intervention. But point taken about pass-by-value deficiencies.
tomp 3 years ago
Modern C++ has reference counting (“smart pointers”) and memory leaks instead, so YMMV.
- birdymcbird 3 years ago
  
  “And memory leaks instead” I choked on my morning coffee hahahahah
winrid 3 years ago
I wouldn't say it's a non issue. I frequently have to tune allocators to fix heap fragmentation in databases...
- einpoklum 3 years ago
  
  Recognized on the last line of the post I linked to actually.
  
  1 reply →
kaba0 3 years ago

You can’t avoid garbage if you deal with generic programs — Rust and C++ can only implement a subset of all programs without RC (which is the most basic GC algorithm).
This is the same way to many other interesting CS properties — most of them are undecidable at compile time, so you have to do it at runtime.
shwestrick 3 years ago
More and more people nowadays are programming at high levels of abstraction. If you're designing the frontend of a website, or making a mobile game, or developing a stock trading algorithm, or whatever else, then you probably don't want to constantly worry about details of memory management...
On top of this, GC is necessary for some algorithms. Any data structure with partial sharing (e.g. binary search tree with versioning via path-copying) needs GC to be space-efficient. You could either rely on a built-in GC, or write your own. If you write your own, I think you'll find that it is tedious and error-prone due to memory safety issues.
- pjmlp 3 years ago
  
  We could already be doing that during the 1980's had it not been for UNIX getting out of Bell Labs with source tapes, and bad management decisions.
  https://interlisp.org/
  http://toastytech.com/guis/cedar.html
  "Eric Bier Demonstrates Cedar"
  https://www.youtube.com/watch?v=z_dt7NG38V4&t=2s
  "Making Smalltalk"
  https://www.youtube.com/watch?v=PaOMiNku1_M
  http://www.edm2.com/index.php/VisualAge_Smalltalk
  Also there is to note that most BASIC implementations had support for automatic memory management, at least for strings and arrays, the structured compiled dialects even better.
  Also database programming with languages like Clipper and FoxPro.
  And even if we stay in the UNIX world, that is exactly using stuff like Perl also allowed for, C like programming without the headaches of manual memory management.
  Or the brief fad of 4GL languages.
- einpoklum 3 years ago
  
  > then you probably don't want to constantly worry about details of memory management...
  Oh, you actually don't have to, that's the whole point... in the past, you (effectively) had a choice between careful manual management of memory and garbage collection with its overheads. These days, you can use constructs which take care of that management for you, with very little or no overhead, which don't involve garbage.
  It's true that sometimes GC is algorithmically necessary; but then the high-level-of-abstraction argument is irrelevant. And in those cases, a GC library does indeed come in useful. You don't need to write your own, others have likely done it already.

NeutralForest 3 years ago

Cool stuff! The last 10 years have seen lots of great languages pushing the field so I'm excited to see what's in the book =)

LadyCailin 3 years ago

Bizarre that you can't preorder it yet, you have to wait until June, just to preorder.

mananaysiempre 3 years ago
As a guess, they might want the accounting for preorders to fall into Q3? Is that a thing people do?
- mcculley 3 years ago
  
  Maybe there is gamesmanship around best seller lists?
zzxcvb 3 years ago

I pre-ordered it from the linked website. Maybe it's only available for certain regions?
avinassh 3 years ago

its available on amazon to pre order - https://www.amazon.com/dp/1032218037
rwmj 3 years ago

I managed to pre-order it on Amazon UK.

mkl95 3 years ago

Will there be a PDF / epub option?

vincent-manis 3 years ago
If I'm not mistaken, Routledge uses that VitalSource dreck, so you can only read their ebooks via a dedicated app. I don't buy dead-tree books any more, but I'll make an exception for this one.
- microtherion 3 years ago
  
  The previous edition of this book is available on e.g. Apple's bookstore and Kindle. Unfortunately, the new one does not show an eBook option yet. I very much hope that one will be available soon, instead of them holding back the eBook to goose hardcover sales or some other reason.
daveoc64 3 years ago

Looks like their ebooks are available through a number of sources:
https://www.routledge.com/our-products/ebooks

vrglvrglvrgl 3 years ago

[dead]

29athrowaway 3 years ago

[flagged]

frou_dh 3 years ago
Who is Robert Sewell and why are his preferences interesting?
- mr_00ff00 3 years ago
  
  1. It’s a joke
  2. Many great programmers hold the opinion that Java is horrible. Linus for example. So if Sewell doesn’t seem credible, try the creator of Linux and git.
  
  6 replies →
- hutzlibu 3 years ago
  
  I think it is a joke and at least to me it is funny, without knowing who that guy is or was.
- 29athrowaway 3 years ago
  
  Someone who is right regarding this.

nektro 3 years ago

darn i thought this was gonna be a book about waste management

cidd 3 years ago

Rust has left the building

ufo 3 years ago

The book also has a chapter on reference counting ;-)
rwmj 3 years ago
Rust also uses reference counting, probably the worst sort of garbage collection.
- nu11ptr 3 years ago
  
  Only when used in a naïve way, which Rust does not. For example, the increments/decrements are done only when "clone" is called and scope exit respectively, and based on Rust ownership/borrow checking, is rarely done combining the best of both worlds (but yes, implementations with aggressive increment/decrements in loops and on every function call can be very slow). Rust also separates Arc (atomic refs) and Rc (non-atomic refs) and enforces usage scenarios in the type checker giving you cheap Rc in single threaded scenarios. Reference counting when done in a smart way works pretty well, but you obviously have to be a little careful of cycles (which in my experience are pretty rare and fairly obvious when you have such a data type).
- sirwhinesalot 3 years ago
  
  The increment/decrement calls only occur on an explicit call to .clone(). No .clone(), no increment/decrement.
  You won't see many clones in rust code.
  
  2 replies →
- mr_00ff00 3 years ago
  
  Tracing is the worst in terms of performance
  
  21 replies →
medo-bear 3 years ago

[flagged]

fithisux 3 years ago

Do we need this book now that we have Rust ??

mcguire 3 years ago
No. Rust has completely consigned all of computer science to the dustbin of irrelevance.
- vincent-manis 3 years ago
  
  Perhaps someone will rewrite this book (the whole book, not just the code) in Rust.
rurban 3 years ago

Rust removed its garbage collector and memory safety 9 years ago. Stack overflows and unsafeties are plaguing it since. Reading such a book would be a necessity for rustaceans esp.
comonoid 3 years ago

Well, we may use it to re-implement a GC in Rust!
dreamcompiler 3 years ago
That's like asking "Do we need Go now that we have Rust?"
- fithisux 3 years ago
  
  Exactly that.
  BTW My comment was ironic.