Comment by NobleExpress

3 years ago

> RC is super cheap. Seriously. You can do about several billion of them per second.

Right. You bump allocate faster, however. RC is an additional operation to the allocation request. Given naive RC can't move objects, it necessarily needs a free-list allocator. Free-list allocators can allocate pretty fast (for the most common object sizes), but can't reach the speeds of bump allocators. Furthermore, bump allocators have better locality of reference than free-list allocators.

Also I've never questioned you can't do non-atomic increments/decrements efficiently. They are exceptionally fast. However you are still converting every pointer read operation into a pointer write operation. The rule of thumb is that there are generally 10x more pointer read operations in a program than pointer write operations. This is why read barriers are also considered more costly than write barriers.

> I did actually provide proof by the way. Apple’s phones use half the RAM as Android and are at least as equally fast even if you discount better HW.

I don't really agree with this statement. There are way too many unknown/unaccounted variables. It could be better hardware, maybe Android has a terrible architecture, and could just be as you said that Swift RC is genuinely better than ART GC or it can be whatever. Point is that it's not a scientific comparison. We don't know if the benefits in iOS come from RC. And we wont be able to know unless we have two systems where all parameters are the exact same _except_ one is using RC and another is using tracing GC, with both systems ran on the same hardware and on the same set of benchmarks.

> That’s only kind of true. Good allocators seem to mitigate this problem quite effectively

Yes. Note that mimalloc literally has a concept of "deferred frees" effectively emulating GC as otherwise freeing an object can result in an unbounded recursive free call (for example, dropping a large linked list).

> I’m just saying that good memory management (made fairly easy in Rust) is always going to outperform tracing GC the same way optimized assembly will outperform the compiler.

Sure. The perfect memory manager is omniscient and knows exactly when an object is not required and will free it. But unfortunately we don't have perfect memory managers yet. So yes I agree in principle, but we have to be pragmatic.

> And I can’t belabor this point enough - languages with RC use it rarely as shared ownership is rarely needed and you can typically minimize where you use it.

I feel like that entirely depends on the problem domain? I don't know where you're getting the "shared ownership is rarely needed" from? Maybe this is true for your problem domain but may not be for others. And if it's true for yours, then great! You can use RC and optimize your programs that way.

> A tracing garbage collector still needs to do atomic reads of data structures which potentially requires cross cpu shoot downs.

Sure. GC metadata needs to be accessed and updated atomically. 100% agree with you. The order of magnitude of those operations is likely much less than what you would get with naive RC though.

> as Swift and ObjC demonstrate, it’s generally good enough without any serious performance implications (throughput or otherwise).

In one of my comments about Swift in this thread, the two papers I linked see up to 80% of the benchmark execution time being dominated by ARC operations! I will note that the papers are around 6-7 years old so the situation might have drastically changed since then, but I haven't personally found many new/contemporary evaluations of Swift RC.