Comment by kazinator
13 hours ago
Reference counting does not trace dead objects. Most of its activity is concerned with maintaining reference counts on live objects.
When a reference count hits zero, that's when refcounting begins to be concerned with a dead object; and that part of its operation corresponds to the sweep activity in garbage collection, not to the tracing of live objects.
It is not a dual to garbage collection concerned with its negative spaces; it's simply a less general (we could legitimately say lesser) garbage collection that doesn't deal with cycles on its own.
> it's simply a less general (we could legitimately say lesser) garbage collection that doesn't deal with cycles on its own.
There are different implementation and performance trade-offs associated with both. I’ll focus on the two that are most meaningful to me.
Reference counting can be added as a library to languages that don’t want or can’t have a precise garbage collector. If you work in C++ (or Rust), it’s a very viable way to assure that you have some measure of non-manual clean up while maintaining precise resource control.
Similarly, when performance matters reference counting is essentially deterministic much easier to understand and model.
In a lot of situations, garbage collection is an overall better strategy, but it’s not a strict superset, and not always the right choice.
> Similarly, when performance matters reference counting is essentially deterministic much easier to understand and model.
Is it? What happens if you remove that one last reference to a long chain of objects? You might unexpectedly be doing a ton of freeing and have a long pause. And free itself can be expensive.
> Is it?
Yes.
> What happens if you remove that one last reference to a long chain of objects?
A mass free sometime vaguely in future based on the GC's whims and knobs and tuning, when doing non-refcounting garbage collection.
A mass free there and then, when refcounting. Which might still cause problems - but they are at least deterministic problems. Problems that will show up in ≈any profiler exactly where the last reference was lost, which you can then choose to e.g. ameliorate (at least when you have source access) by choosing a more appropriate allocator. Or deferring cleanup over several frames, if that's what you're into. Or eating the pause for less cache thrashing and higher throughput. Or mixing strategies depending on application context (game (un)loading screen probably prioritizes throughput, streaming mid-gameplay probably prioritizes framerate...)
> You might unexpectedly be doing a ton of freeing and have a long pause. And free itself can be expensive.
Much more rarely than GC pauses cause problems IME.
> There are different implementation and performance trade-offs associated with both.
They are not the same because there are "semantic tradeoffs".
> Reference counting does not trace dead objects.
> […]
> and that part of its operation corresponds to the sweep activity in garbage collection, not to the tracing of live objects.
The sweep activity in garbage collection traces live objects. https://en.wikipedia.org/wiki/Garbage_collection_(computer_s...:
“The overall strategy consists of determining which objects should be garbage collected by tracing which objects are reachable by a chain of references from certain root objects, and considering the rest as garbage and collecting them.”
The tracing (or mark) part is, appropriately, the one that traces objects.
Sweeping is just
> considering the rest as garbage and collecting them
Reference counting does trace dead objects. When the reference count hits zero, you have to recursively trace through all objects referenced by the newly dead object. That’s a trace of dead objects.
That can be identified as a finalization-driven sweep. The object whose refcount hits zero is finalized, and the routine for that drops its references to other objects.
Garbage collection also traces dead objects. Or at least some kinds of GC implementations that are not copying. when the marking is done, the heaps are traversed again to identify dead objects, which are put onto a free list. That's a trace of dead objects. (Under copying collection, that is implicit; live objects are moved to a new heap and the vacated space is entirely made available for bump allocation.)
I think you're skipping over some important distinctions here.
In a mark & sweep GC, the mark phase traverses the object graph, visiting only live objects. You need to recursively visit any objects that are not already marked — this is the process known as tracing. The marking time is proportional to the number of live objects.
In the sweep phase, you typically do a linear traversal of memory and reclaim any objects that are not marked. You do not examine pointers inside any objects, so the graph structure is irrelevant. The time is proportional to the total size of memory.
In reference counting, when a refcount hits 0, you need to decrement the refcount of pointed-to objects, and recursively visit any objects whose refcount is now 0. The time is proportional to the number of objects that have just died.
Structurally, decrementing refcounts is *very* similar to tracing. You're right that it's purpose is similar to the sweep phase of a mark & sweep GC, but they aren't algorithmically similar.
That's a linear traversal of the heap, not a trace. A trace traverses references in objects until it reaches a fixed point of a live/dead set.
Yeah!
Also, the closest thing to a graph search in reference counting, as most of us understand it, occurs at a totally different level in the stack.
In a GC: the graph search is hidden from view and implemented in any number of clever ways by the language runtime. Or, if you’re implementing the GC yourself, it sits out of the way of normal operations on your “heap”.
In ref counting: down ref to zero triggers a destructor. That destructor can do anything. It so happens that if the type has reference counted references, then the destructor will, either automatically or manually, invoke downref operations. All of this is part of the language’s normal execution semantics.
To say that these things are alike is a stretch. You could use similar logic to say that anyone doing a graph search is collecting garbage