← Back to context

Comment by admax88qqq

3 years ago

There was this project though.

https://github.com/Roldak/AGC

The best part is that it's faster than manual management. People will tell you they need do to malloc and free manually for performance, but when you actually run the numbers GC wins for a majority of use cases.

Tracing garbage collectors don’t generally win against reference counting connectors, especially when those reference counts are automatically elided via ARC (eg swift and objective C) or because they’re rarely used by means of composition (c++ and rust). Additionally, different kinds of application strategies are better depending on the use case (eg a pool allocator that you bulk drop at the end of some computation).

What papers are you referencing showing tracing GCs outperforming things? If it’s just the website, I think it’s an artifact of a micro benchmark rather than something that holds true for non trivial programs.

  • I've always heard of the "Swift elides reference counts" statements but I've never seen it substantiated. I don't claim to be a Swift GC expert by any means, but the impression I get from the two Swift GC papers I've read [1, 2] is that Swift has a very simple implementation of RC. The RC optimization document (albeit the document is incomplete) [3] also doesn't give me the impression that Swift is doing much eliding of reference counts (I'm sure it is doing it for simple cases).

    Do you have any links which might explain what kind of eliding Swift is doing?

    EDIT: The major RC optimizations I have seen which elide references are deferral and coalescing and I'm fairly certain that Swift is doing neither.

    [1]: https://dl.acm.org/doi/abs/10.1145/3170472.3133843

    [2]: https://doi.org/10.1145/3243176.3243195

    [3]: https://github.com/apple/swift/blob/main/docs/ARCOptimizatio...

  • The conventional wisdom is that evacuating (copying) GC's win over malloc/free since 1) the GC touches only the live data and not the garbage, and 2) it compacts the active memory periodically, which improves its cache and (when relevant) paging hit rates.

    Obviously though, this will be situation dependent.

  • Then why do every performant managed language opts for tracing GCs when they can?

    RC is used in lower level languages because it doesn’t require runtime support, and can be implemented as a library.

    As I wrote in another comment, even with elisions, you are still trading off constant writes on the working thread for parallel work, and you even have to pay for synchronization in parallel contexts.

    • Because tracing GCs can solve referential loops which RC can’t. So at the language level where you have to handle all sorts of programs written by programmers of varying quality (+ mistakes) a tracing GC gives better predictable memory usage performance across a broader range of programs.

      Seriously. A single threaded reference counter is super cheap. Cross thread reference counts shouldn’t be used and I think are an anti pattern - it’s better to have the owning thread be responsible for maintaining the reference count and passing a borrow via IPC that the borrower has to hand back. There is also hybrid RC where you Arc across threads but use RC within the thread. This gives you the best of both worlds with minimal cost. Which model you prefer is probably a matter of taste.

      CPUs are stupid fast at incrementing and decrementing a counter. Additionally most allocations should be done on the stack with a small amount done on the heap that is larger / needs to outlive the current scope. I’ve written all sorts of performance-critical programs (including games) and never once has shared_ptr in C++ (which is atomic) popped up in the profiler because the vast majority of allocations are on stack, value composition, or unique_ptr (ie no GC of any kind needed).

      The fastest kind of GC is one where you don’t even need any (ie Box / unique_ptr). The second fastest is an integrated increment that’s likely in your CPU cache. I don’t think anyone can claim that pointer chasing is “fast” and certainly not faster than ARC. Again assuming you’re not being uncareful and throwing ARC around everywhere when it’s not needed in the first place. Value composition is much more powerful and leave RC / Arc when you have a more complicated object graph with shared ownership (and even then try to give ownership to the root uniquely or through RC and hand out references only to children and RC to peers).

      13 replies →

  • Swift and Objective-C ARC performance is quite poor.

    • Compared to what, though? And is that still the case if all OS components use whatever it is, as opposed to a few applications? Memory efficiency is crucial for overall system performance, and ARC is highly memory-efficient compared to every production GC I’m aware of.

      2 replies →