Fil's Unbelievable Garbage Collector

10 days ago (fil-c.org)

Hmm, Fil-C seems potentially really important; there's a lot of software that only exists in the form of C code which it's important to preserve access to, even if the tradeoffs made by conventional C compilers (accepting large risks of security problems in exchange for a small improvement in single-core performance) have largely become obsolete.

The list of supported software is astounding: CPython, SQLite, OpenSSH, ICU, CMake, Perl5, and Bash, for example. There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU? They're making all the right noises about capability security and nonblocking synchronization and whatnot.

Does anyone have experience using it in practice? I see that https://news.ycombinator.com/item?id=45134852 reports a 4× slowdown or better.

The name is hilarious. Feelthay! Feelthay!

  • > I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU?

    You could. That said, FUGC’s guts rely on OS features that in turn rely on an MMU.

    But you could make a version of FUGC that has no such dependency.

    As for perf - 4x is the worst case and that number is out there because I reported it. And I report worst case perf because that’s how obsessive I am about realistically measuring, and then fanatically resolving, perf issues

    Fact is, I can live on the Fil-C versions of a lot of my favorite software and not tell the difference

    • > As for perf - 4x is the worst case and that number is out there because I reported it

      I love the concept of Fil-C but I find that with the latest release, a Fil-C build of QuickJS executes bytecode around 30x slower than a regular build. Admittedly this is an informal benchmark running on a GitHub CI runner. I’m not sure if virtualization introduces overheads that Fil-C might be particularly sensitive to (?). But I’ve sadly yet to see anything close to a 4x performance difference. Perhaps I will try running the same benchmark on native non-virtualized x86 later today.

      Also, so I am not just whining, my Fil-C patch to the QuickJS main branch contains a fix for an issue that’s only triggered by regex backtracking, and which I think you might have missed in your own QuickJS patch:

      http://github.com/addrummond/jsockd/blob/main/fil-c-quickjs....

      4 replies →

    • A Fil-C kernel that ran the whole system in the same address space, safely, would sure be something. Getting rid of the overhead of hardware isolation could compensate for some of the overhead of the software safety checks. That was the dream of Microsoft's Singularity project back in the day.

      I guess there would be no way to verify that precompiled user programs actually enforce the security boundaries. The only way to guarantee safety in such a system would be to compile everything from source yourself.

      22 replies →

    • How would you go about writing a program/function that runs as close to native speed as possible on Fil-C?

      How much more memory do GC programs tend to use?

      Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)?

      You mention 'accurate' by which I assume you use the compiler to keep track of where the pointers are (via types/stackmaps).

      How do you deal with pointers that get cast to ints, and then back?

      1 reply →

    • When you run the Fil-C versions of your favourite software, does it have a sanitizer mode that reports bugs like missing free() etc? And have you found any bugs this way?

      26 replies →

    • Yeah, I meant to be clear that 4× was the worst case, and I think it's an impressive achievement already, and perfectly fine for almost everything. After all, running single-threaded software on an 8-core CPU is already an 8× slowdown, right? And people do that all the time!

      What's the minimal code size overhead for FUGC?

      3 replies →

  • > The list of supported software is astounding: CPython, SQLite, OpenSSH, ICU, CMake, Perl5, and Bash, for example. There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

    Interestingly, I agree with your point in general that there's a lot of software that Fil-C might be a good fit for, but I hesitate to say that about any of the examples you listed:

    * CPython and Perl5 are the runtimes for notoriously slow GCed languages, and adding the overhead of a second GC seems...inelegant at best, and likely to slow things down a fair bit more.

    * Some of them do have reimplementations or viable alternatives in Rust (or Go or the like) underway, like Turso for SQLite.

    * More generally, I'd call these foundational, widely-used, actively maintained pieces of software, so it seems plausible to me that they will decide to RiiR.

    I think the best fit may be for stuff that's less actively maintained and less performance-critical. There's 50 years of C programs that people still dig out of the attic sometime but aren't putting that much investment into and are running on hardware vastly more powerful than these programs were written for.

    • Yeah, for that reason perhaps Perl5 is a better example than CPython, but something less widely used might be a better example. tcsh, say.

  • Note the power of SQLite being written in C is the portability to non standard OSes. [0] I've used on an embedded real-time μC/OS-II variant. [1]

    Architecture of embedded solutions is different than desktop and server. Example, to prevent memory from fragmenting and high performance, do not free it. Mark that memory (object / struct) as reusable. It is similar to customized heap allocation or pooling.

    [0] https://sqlite.org/vfs.html [1] https://en.wikipedia.org/wiki/Micro-Controller_Operating_Sys...

  • > There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

    How many years away are we from having AI-enhanced static analysis tools that can accurately look at our C code (after the fact or while we're writing it) and say "this will cause problems, here's a fix" with a level of accuracy sufficient that we can just continue using C?

  • "I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU?"

    Even if it worked for normal data flow, that's the sort of thing that's bound to introduce covert channels, I'd have thought. To start with I guess you have immediately disabled the mitigations of meltdown/spectre, because doesn't that happen when you switch processes?

    • Yes, it definitely will not work to plug covert channels or side-channel attacks like Spectre. Typically, computers without MMUs also don't have speculative execution, or in most cases even caches, so Spectre specifically wouldn't be relevant, but lots of other timing side channels would. Maybe other side channels like EMI and power consumption as well.

      But consider, for example, decoding JPEG, or maybe some future successor to JPEG, JEEG, by the Joint Evil Experts Group. You want to look at a ransom note that someone in the JEEG has sent you in JEEG format so that you know how much Bitcoin to send them. You have a JEEG decoder, but it was written by Evil Experts, so it might have vulnerabilities, as JPEG implementations have in the past, and maybe the ransom note JEEG is designed to overflow a buffer in it and install a rootkit. Maybe the decoder itself is full of malicious code just waiting for the signal to strike!

      If you can run the JEEG decoder in a container that keeps it from accessing the network, writing to the filesystem, launching processes, executing forever, allocating all your RAM, etc., only being permitted to output an uncompressed image, even if you let it read the clock, it probably doesn't matter if it launches some kind of side-channel attack against your Bitcoin wallet and your Bitchat client, because all it can do is put the information it stole into the image you are going to look at and then discard.

      You can contrive situations where it can still trick you into leaking bits it stole back to the JEEG (maybe the least significant bits of the ransom amount) but it's an enormous improvement over the usual situation.

      Then, FalseType fonts...

      3 replies →

  • With improvements in coding agents, rewriting code in rust is pretty damn easy, and with a battle tested reference implementation, it should be easy to make something solid. I wouldn't be surprised if we have full rewrites of everything in rust in the next few years, just because it'll be so easy.

    • I have had better experiences with LLMs translating code from one language to another than writing code from scratch, but I don't think the current state of LLMs makes it "pretty damn easy" to rewrite code in Rust, especially starting from garbage-collected languages like Perl or Lua.

      Certainly it's plausible that in the next few years it'll be pretty damn easy, but with the rapid and unpredictable development of AI, it's also plausible that humanity will be extinct or that all current programming languages will be abandoned.

      5 replies →

    • I don’t buy it but let’s say that in the best case this happens.

      Then we’ll have a continuation of the memory safety exploit dumpster fire because these Rust ports tend to use a significant amount of unsafe code.

      On the other hand, Fil-C has no unsafe escape hatches.

      Think of Fil-C as the more secure but slower/heavier alternative to Rust

      7 replies →

  • SQLite in Rust https://github.com/tursodatabase/turso

    CPython in Rust https://github.com/RustPython/RustPython

    Bash in Rust https://github.com/shellgei/rusty_bash

    • Turso says:

      > Warning: This software is ALPHA, only use for development, testing, and experimentation. We are working to make it production ready, but do not use it for critical data right now.

      https://rustpython.github.io/pages/whats-left says:

      > RustPython currently supports the full Python syntax. This is “what’s left” from the Python Standard Library.

      Rusty_bash says:

      > Currently, the binary built from alpha repo has passed 24 of 84 test scripts.

      The CPython implementation is farther along than I had expected! I hope they make more progress.

      7 replies →

This feels like one of those rather rare projects that is both sailing pretty close to research, and also yielding industrially useful results -- I love it! Normally I'd expect to see something like this coming out of one of the big tech companies, where there are enough advertising megabucks to pay a small team to work on a project like this (providing someone can make the business case...). So I'm curious: what was the initial motivation for this work? Assuming this is not a passion project, who is funding it? How many person years of work has this involved? What is the end game?

It is great that Fil-C exists. This is the sort of technique that is very effective for real programs, but that developers are convinced does not work. Existence proofs cut through long circular arguments.

  • What do the benchmarks look like? My main concern with this approach would be that the performance envelope would eliminate it for the use-cases where C/C++ are still popular. If throughput/latency/footprint are too similar to using Go or what have you, there end up being far fewer situations in which you would reach for it.

    • Some programs run as fast as normally. That's admittedly not super common, but it happens.

      Some programs have a ~4x slowdown. That's also not super common, but it happens.

      Most programs are somewhere in the middle.

      > for the use-cases where C/C++ are still popular

      This is a myth. 99% of the C/C++ code you are using right now is not perf sensitive. It's written in C or C++ because:

      - That's what it was originally written in and nobody bothered to write a better version in any other language.

      - The code depends on a C/C++ library and there doesn't exist a high quality binding for that library in any other language, which forces the dev to write code in C/C++.

      - C/C++ provides the best level of abstraction (memory and syscalls) for the use case.

      Great examples are things like shells and text editors, where the syscalls you want to use are exposed at the highest level of fidelity in libc and if you wrote your code in any other language you'd be constrained by that language's library's limited (and perpetually outdated) view of those syscalls.

      79 replies →

  • That has been the qualm of programming since the Assembly days, unfortunately most developers aren't Ivan Suntherland, Alan Kay, Steve Jobs, Bret Victor, and other similar minded visionaries.

    Most of us sadly cargo cult urban myths and only believe in what is running in front of them like Saint Thomas, as to have any kind of feeling how great some things could be like.

    Hence why so many UNIX and C clones, instead of creating something new, to be honest those two guys were also visionaries despite some of the flaws, back in 1970's.

Given the goal is to work with existing C programs (which already have free(...) calls "carefully" placed), and you're already keeping separate bounds info for every pointer, I wonder why you chose to go with a full GC rather than lock-and-key style temporal checking[1]? The latter would make memory usage more predictable and avoid the performance overhead and scheduling headaches of a GC.

Perhaps storing the key would take too much space, or checking it would take too much time, or storing it would cause race condition issues in a multithreaded setting?

[1] https://acg.cis.upenn.edu/papers/ismm10_cets.pdf

  • I think the lock and key approaches don’t have Fil-C’s niftiest property: the capability model is totally thread safe and doesn’t require fancy atomics or locking in common cases

  • Also find it interesting that you're allowing out-of-bounds pointer arithmetic as long as no dereference happens, which is a class of UB compilers have been known to exploit ( https://stackoverflow.com/questions/23683029/is-gccs-option-... ). Do you disable such optimizations inside LLVM, or does Fil-C avoid this entirely by breaking pointers into pointer base + integer offset (in which case I wonder if you're missing out on any optimizations that work specifically on pointers)?

    • For starters, llvm is a lot less willing to exploit that UB

      It’s also weird that GCC gets away with this at all as many C programs in Linux that compile with GCC make deliberate use of out of bounds pointers.

      But yeah, if you look at my patch to llvm, you’ll find that:

      - I run a highly curated opt pipeline before instrumentation happens.

      - FilPizlonator drops flags in LLVM IR that would have permitted downstream passes to perform UB driven optimizations.

      - I made some surgical changes to clang CodeGen and some llvm passes to fix some obvious issues from UB

      But also let’s consider what would happen if I hadn’t done any of that except for dropping UB flags in FilPizlonator. In that case, a pass before pizlonation would have done some optimization. At worst, that optimization would be a logic error or it would induce a Fil-C panic. FilPizlonator strongly limits UB to its “memory safe subset” by construction.

      I call this the GIMSO property (garbage in, memory safety out).

      4 replies →

This is really cool! I noticed

>The fast path of a pollcheck is just a load-and-branch.

A neat technique I've seen used to avoid these branches is documented at https://android-developers.googleblog.com/2023/11/the-secret... under "Implicit suspend checks".

  • Yeah that’s a common optimization for poll checks. I think most production JVMs do that.

    I’m very far from doing those kinds of low level optimizations because I have a large pile of very high level and very basic optimizations still left to do!

    • We did it for MaxineVM back in the day, having the thread-local-storage point to itself and the safepoint as a load back into the same register. The problem is that that introduces a chain of dependent loads for all safepoints and for all operations that use thread-local storage. That seems like it would hurt OOE and IPC as a result.

      I am working on adding threads to Virgil (slowly, in the background, heh). I'll use the simple load+branch from the TLS for the simple reason that the GC code is also written in Virgil and it must be possible to turn off safepoints that have been inserted into the GC code itself, which is easy and robust to do if they are thread-local.

I'm not sure I understand all of what they're doing there, but I did read the referenced Doligez-Leroy-Gonthier paper a while back and I am glad someone is doing something with that in a non-academic (maybe?) context. That paper looked promising to me when I read it, but I basically had no faith that it would ever make it out of academia because the algorithm is so complex. It took me a really long time to think I understood it, and it's one of those things I actually am not confident I could implement even when I understood it (I certainly don't understand it now).

  • I don’t think I’m the only one doing something in the vicinity of that paper. I think close relatives of DLG shipped in some prod JVMs.

    • Interesting. I've read a lot about some complex JVMs, but I guess maybe they didn't cite their sources and I didn't make the connection on my own.

Pretty wild to see a concurrent, non-moving GC strapped onto plain C. If I can take a mid-size C codebase and trade ~2–3× runtime for fewer memory footguns, I’d take it. How rough is incremental adoption—per target, or all-in toolchain?

I love C, performance and security. Between this garbage collector and the capability enforcement, this is appealing. I've thought a few times about what a more secure C would look like, brushing over capability concepts here and there, but I don't belong near compiler code.

How hard would it be to support Windows?

  • <off-topic> Took me far too long to understand your opening sentence; eventually realised the Oxford comma would've helped me out. Rare to see a clear example in the wild.

    • The potential for misinterpretation didn't escape me, but I was also confident that there was sufficient context to understand it. An Oxford comma there would have suggested a more careful cadence if it were spoken, which is not what I wanted.

      As you said, it can be rare to see a case where it truly is ambiguous, but the context here negates that well enough.

      2 replies →

I think it is really cool that someone is going hard at this part of the design space from an engineering geek standpoint even though I can’t use it myself.

IMHO Garbage collection is and always was an evolutionary dead end. No matter how nice you make it, it feels wrong to make a mess and have some else clean it up inefficiently at some point later.

And because of that it always involves some sort of hidden runtime cost which might bite you eventually and makes it unusable for many tasks.

I'd rather have my resource management verified at compile time and with no runtime overhead. That this is possible is proven by multiple languages now.

That being said, I can imagine some C programs for which using Fil-C is an acceptable trade-off because they just won't be rewritten in language that is safer anytime soon.

  • > IMHO Garbage collection is and always was an evolutionary dead end. No matter how nice you make it, it feels wrong to make a mess and have some else clean it up inefficiently at some point later.

    There are problem domains where tracing garbage collection simply cannot be avoided. This is essentially always the case when working with problems that involve constructing arbitrary spaghetti-like reference graphs, possibly with cycles. There's no alternative in that case to "making a mess" and dealing with it as you go, because that requirement is inherent in the problem itself.

    It would be interesting to have a version of Fil-C that could work with a memory-safe language like Rust to allow both "safe" leaf references (potentially using ownership and reference counting to represent more complex allocated objects, but would not themselves "own" pointers to Fil-C-managed data, thus avoiding the need to trace inside these objects and auto-free them) and general Fil-C managed pointers with possible cycles (perhaps restricted to some arena/custom address space, to make tracing and collecting more efficient). Due to memory safety, the use of "leaf" references could be ensured not to alter the invariants that Fil-C relies on; but managed pointers would nonetheless be available whenever GC could not be avoided.

    • > There are problem domains where tracing garbage collection simply cannot be avoided.

      Can you expound on that? I've been doing this for a while and haven't seen such a domain yet.

      2 replies →

  • The topic has been debated for decades. It's your opinion but it's pretty reductionist and basically religious one at this point. All memory management has costs, both at compile time, run time, and cognitive overhead. You can move costs around, reduce them, and avoid some, but you'll never get away from it.

    > it feels wrong to make a mess and have some else clean it up inefficiently at some point later.

    Yet no one argues for manual register allocation anymore and will gleefully use dozens or even hundreds of locals and then thousands of functions, just trusting the compiler to sort it all out.

    We make progress by making the machines implement our nice abstractions.

  • Engineering is about tradeoffs.

    When I write in Rust, the process uses very little RAM. BUT, I often spend a lot of time working through ownership issues and other syntax sugar to prove that memory is cleaned up correctly.

    When I write in garbage collected languages, I can move a lot faster. Yes, the process uses more RAM, but I can finish a lot more quickly. Depending on the program that I'm writing, finishing quickly may be more important than using as little RAM as possible.

    Furthermore, "which is better" isn't always clear. If you're relying on reference counting (smart pointers; or ARC or RC in Rust), you could actually spend more CPU cycles maintaining the count than an optimized garbage collector will spend finding free memory.

    (IE, you spend a lot of time working in a RAM efficient language only to end up with a program that trades off RAM efficiency for CPU efficiency. Or even worse, you might miss your window for building a prototype or testing a feature because you became obsessed with a metric that just doesn't matter.)

    These are very critical tradeoffs to understand when you make statements like "Garbage collection is and always was an evolutionary dead end," "it feels wrong to make a mess and have some else clean it up inefficiently at some point later," and "hidden runtime cost".

    (Remember, sometimes maintaining a reference count uses more CPU than an optimized garbage collector.)

    • Thought you made some very good points. Being faster to getting a working prototype and beta out the door, can be the difference between success versus failure for many, even though it might be at the cost of a bit more ram or it being a little slower.

      Other related points are: 1) Feeling the need to be a bit faster versus an actually critical necessity. 2) GC is "bad" and manual is "good" silliness. 3) Premature optimization or unnecessary optimization.

      These can be personal or even psychological issues, not reflected in reality. Kind of like a guy who spends globs of time and money on building a "must have" monster 10-second car, but 99% of the time, no one ever needs to go that fast. Speed definitely has its place, but is also relative to what is being done, the situation, and the hardware used.

  • Garbage collection performance lies along a memory-overhead vs time-overhead curve which can be tuned. Manual memory management sits at one end (minimal memory overhead) but is often slower than garbage collection if you are willing to accept some space overhead. Observe that the most performance sensitive programs worry a lot about allocation whether or not they use garbage collection.

  • > IMHO Garbage collection is and always was an evolutionary dead end.

    An evolutionary dead end that is used by probably upwards of 90% of all productively running code, and is the subject of lots of active research, as evidenced by TFA?

    > No matter how nice you make it, it feels wrong to make a mess and have some else clean it up inefficiently at some point later.

    > And because of that it always involves some sort of hidden runtime cost

    Because of what? Your feelings?

  • >I'd rather have my resource management verified at compile time and with no runtime overhead

    Malloc and free have runtime overhead — sometimes more than the overhead of garbage collection.

    The only way to have no overhead is to statically allocate fixed sized buffers for everything.

Super cool project. Sorry if you explained this already, I don't know what "Dijkstra accurate" means. How does it know if an object is truly available to be reclaimed, given that pointers can be converted to integers?

  • > given that pointers can be converted to integers?

    Because if they get converted to integers and then stored to the heap then they lose their capability. So accesses to them will trap and the GC doesn’t need to care about them.

    Also it’s not “Dijkstra accurate”. It’s a Dijkstra collector in the sense that it uses a Dijkstra barrier. And it’s an accurate collector. But these are orthogonal things

    • Hmm, I'm still not understanding the bit of information that I'm trying to ask about.

      Let's say I malloc(42) then print the address to stdout, and then do not otherwise do anything with the pointer. Ten minutes later I prompt the user for an integer, they type back the same address, and then I try to write 42 bytes to that address.

      What happens?

      Edit: ok I read up on GC literature briefly and I believe I understand the situation.

      "conservative" means the garbage collector does not have access to language type system information and is just guessing that every pointer sized thing in the stack is probably a pointer.

      "accurate" means the compiler tells the GC about pointer types, so it knows about all the pointers the type system knows about.

      Neither of these are capable of correctly modeling the C language semantics, which allows ptrtoint / inttoptr. So if there are any tricks being used like xor linked lists, storing extra data inside unused pointer alignment bits, or a memory allocator implementation, these will be incompatible even with an "accurate" garbage collector such as this.

      I should add, this is not a criticism, I'm just trying to understand the design space. It's a pretty compelling trade offer: give up ptrtoint, receive GC.

      4 replies →

    • Out of curiosity, does this idiom work in fil-c?

      https://github.com/protocolbuffers/protobuf/blob/cb873c8987d...

            // This somewhat silly looking add-and-subtract behavior provides provenance
            // from the original input buffer's pointer. After optimization it produces
            // the same assembly as just casting `(uintptr_t)ptr+input_delta`
            // https://godbolt.org/z/zosG88oPn
            size_t position =
            (uintptr_t)ptr + e->input_delta - (uintptr_t)e->buffer_start;
            return e->buffer_start + position;
      

      It does use the implementation defined behavior that a char pointer + 1 casted to uintptr is the same as casting to uintptr then adding 1.

      5 replies →

> Fil-C uses a parallel concurrent on-the-fly grey-stack Dijkstra accurate non-moving garbage collector called FUGC

Half of my hair turned white while trying to understand this.

Note that the "safepointing" logic is exactly the same thing that's needed in refcounting to atomic replace a field.

This article glosses over what I consider the hardest part - the enter/exit functionality around native functions may that block (but which must touch the allocator).

  • > Note that the "safepointing" logic is exactly the same thing that's needed in refcounting to atomic replace a field.

    No it's not, not even close.

    > This article glosses over what I consider the hardest part - the enter/exit functionality around native functions may that block (but which must touch the allocator).

    Yeah, that part is hard, and maybe I'll describe it in another post.

    Look for `filc_enter` and `filc_exit` in https://github.com/pizlonator/fil-c/blob/deluge/libpas/src/l...

> The only "pause" threads experience is the callback executed in response to the soft handshake, which does work bounded by that thread's stack height.

So this is probably not great for functional/deeply-recursive code I guess?

  • Meh.

    The stack scan is really fast. There's not a lot of logic in there. If you max out the stack height limit (megabytes of stack?) then maybe that means milliseconds of work to scan that stack. That's still not bad.

I love garbage collector design and impl. It’s one of those “go to” thing to do when learning a new language.

Never heard of this one, looking forward to diving in this weekend.

This looks pretty amazing, I'm surprised I haven't heard of it before. Looking forward to trying it out. Seems like a good way to verify the safety of some programs, even if not feasible for production due to performance constraints. Though we have sanitizers for tests, this seems more complete.

will fil-c support ARM soon,for that matter, risc-v? safety is super important for embedded devices running c or c++

I’m curious how expensive the write barrier is in practice? IIRC, it is often write barriers that most affect performance sensitive programs running on non-concurrent garbage collectors (and perhaps safepoints that can cause problems for performance sensitive threads when running with a concurrent gc).

How can the GC be precise when C and C++ allow casting between pointers and integers?

This is very cool, also I wonder does fugc works without InvisiCap, what would it be like without the capability pointer, does it just became not accurate but still usable?

I skimmed the article.

I'm curious about CPU / RAM overhead tradeoffs.

IE, in may cases GC is more CPU efficient than reference counting, but the tradeoff is higher RAM consumption.

Can anyone explain how the roots are known to the GC, I can't spot it. Does it have some kind of precompile step to mark roots for GC scan?

@pizlonator ?

Why did you choose to use an advancing wavefront rather than retreating wavefront strategy?

I'm amazed by how readable the source is, the names meaningful and not filled with jargon.

The description left me confused. Is this something that has to be integrated into the compiler?

  • It’s an implementation of a very C compatible language called FilC. A fork of LLVM to add the Pizlonator compiler pass combined with runtime support stuff.

Just skimmed through the docs - this is super interesting. Love the idea of a minimal GC with no runtime dependencies. Anyone here tested it with something more intense than toy benchmarks? Curious how it handles edge cases under memory pressure or concurrency. Also, would be great to hear how it compares to Boehm or Rust’s drop model in real-world workloads.