Notes by djb on using Fil-C

6 days ago (cr.yp.to)

To summarize, he's sufficiently impressed with it that he's embarking on an attempt to rebuild an entire Debian system with it, and he's written some software (a GC shim library and build scripts) that are likely to be of interest to others who are attempting the same thing.

> I had originally configured the server phoenix with only 12GB swap. I then had to restart ./build_all_fast_glibc.sh a few times because the Fil-C compilation ran out of memory. Switching to 36GB swap made everything work with no restarts; monitoring showed that almost 19GB swap (plus 12GB RAM) was used at one point. A larger server, 128 cores with 512GB RAM, took 8 minutes for Fil-C plus 6 minutes for musl, with no restarts needed.

Yikes that’s a lot of memory! Filc is doing a lot of static analysis apparently.

For those who might miss it, the notes cite a new 64-bit version of cdb that supports exabyte databases

https://cdb.cr.yp.to

Also maybe of interest is that the new cdb subdomain is using pqconnect instead of dnscurve

  • > Also maybe of interest is that the new cdb subdomain is using pqconnect instead of dnscurve

    This is not correct. There isn't a cdb subdomain because cdb.cr.yp.to doesn't have NS records, which is where DNSCurve fits in. If you have a DNSCurve resolver, then your queries for cdb.cr.yp.to will use DNSCurve and will be sent to the yp.to nameservers.

    From there, if you have pqconnect, your http(s) connection to cdb.cr.yp.to will happen over pqconnect.

    Maybe the confusion is because both DNSCurve and pqconnect encode pubkeys in DNS, but they do different things.

    Here is DNSCurve:

      $ dig +short ns yp.to
      uz5jmyqz3gz2bhnuzg0rr0cml9u8pntyhn2jhtqn04yt3sm5h235c1.yp.to.
    

    Here is pqconnect:

      $ dig +short cdb.cr.yp.to
      pq1htvv9k4wkfcmpx6rufjlt1qrr4mnv0dzygx5mlrjdfsxczbnzun055g15fg1.yp.to.
      131.193.32.108
    

    Like CurveCP, pqconnect puts the pubkey into a CNAME.

  • RFC 1034 Domain Concepts and Facilities November 1987 [Page 8]

    "A domain is identified by a domain name, and consists of that part of the domain name space that is at or below the domain name which specifies the domain. A domain is a subdomain of another domain if it is contained within that domain. This relationship can be tested by seeing if the subdomain's name ends with the containing domain's name. For example, A.B.C.D is a subdomain of B.C.D, C.D, D, and " "."

       1 cdb.cr.yp.to - regular DNS:
       124 bytes, 1+2+0+0 records, response, noerror
       query: 1 cdb.cr.yp.to
       answer: cdb.cr.yp.to 30 CNAME pq1jbw2qzb2201xj6pyx177b8frqltf7t4wdpp32fhk0w3h70uytq5020w020l0.yp.to
       answer: pq1jbw2qzb2201xj6pyx177b8frqltf7t4wdpp32fhk0w3h70uytq5020w020l0.yp.to 30 A 131.193.32.109
    

    In the terminology of RFC1034, cdb.cr.yp.to, a CNAME, can be described as a subdomain of cr.yp.to and yp.to

    (NB. The pq1 portion is not a public key, it is a hash of a server's long-term public key)

  • The PQConnect documentation, specifically the document "INSTALL.md", describes the pq1 portion of the CNAME as a subdomain.

       Please update your DNS A/AAAA records for all domains on this server as follows:
    
       Existing record:
       Type    Name        Value
       A/AAAA  SUBDOMAIN   IP Address
    
       New Records:
       Type    Name        Value
       CNAME   SUBDOMAIN   pq1XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.DOMAIN.TLD
       A/AAAA  pq1XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX  IP Address
       TXT    pq1XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.DOMAIN.TLD    p=42424
       TXT    ks.pq1XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.DOMAIN.TLD    ip=IP ADDRESS;p=42425"

Cool project! I take it the goal is that, overhead being acceptable, most C / C++ programmes don't actually "have to be" rewritten in something like Rust?

I wonder how / where Epic Games comes in?

  • Note that Fil-C is a garbage-collected language that is significantly slower than C.

    It's not a target for writing new code (you'd be better off with C# or golang), but something like sandboxing with WASM, except that Fil-C crashes more precisely.

    • From the topic starter: "I've posted a graph showing nearly 9000 microbenchmarks of Fil-C vs. clang on cryptographic software (each run pinned to 1 core on the same Zen 4). Typically code compiled with Fil-C takes between 1x and 4x as many cycles as the same code compiled with clang"

      Thus, Fil-C compiled code is 1 to 4 times as slow as plain C. This is not in the "significantly slower" ballpark, like where most interpreters are. The ROOT C/C++ interpreter is 20+ times slower than binary code, for example.

      19 replies →

    • WASM is a sandbox. It doesn't obviate memory safety measures elsewhere. A program with a buffer overflow running in WASM can still be exploited to do anything that program can do within in WASM sandbox, e.g. disclose information it shouldn't. WASM ensures such a program can't escape its container, but memory safety bugs within a container can still be plenty harmful.

      2 replies →

    • What language do people considering c as an option for a new project consider? Rust is the obvious one we aren't going to discuss because then we won't be able to talk about anything else, Zig is probably almost as well loved and defended, but it isn't actually memory safe, just much easier to be memory safe. As you say, c# and go, also maybe f# and ocaml if we are just writing simple c style stuff none of those would look all that different. Go jhs some ub related to concurrency that people run into, but most of these simple utilities are either single threaded or fine grained parallel which is pretty easy to get right. Julia too maybe?

      4 replies →

    • A GC lang isn't necessarily significantly slower than C. You should qualify your statements. Moreover, this is a variant of C, which means that the programs are likely less liberal with heap allocations. It remains to be seen how much of a slowdown Fil-C imposes under normal operating conditions. Moreover, although it is indeed primarily suited for existing programs, its use in new programs isn't necessarily worse than, e.g., C# or Go. If performance is the deciding factor, probably use Rust, Zig, Nim, D, etc. .

For those, like me, that didn’t know what Fil-C is:

> Fil-C is a fanatically compatible memory-safe implementation of C and C++. Lots of software compiles and runs with Fil-C with zero or minimal changes. All memory safety errors are caught as Fil-C panics. Fil-C achieves this using a combination of concurrent garbage collection and invisible capabilities (InvisiCaps). Every possibly-unsafe C and C++ operation is checked. Fil-C has no unsafe statement and only limited FFI to unsafe code.

https://fil-c.org/

The posted article has a detailed explanation of djb successfully compiling a bunch of C and C++ codebases.

  • I guess to get on board with this, it is my understanding you have to accept the premise of a Garbage Collector in the runtime?

    • Note that it is a garbage collector designed and implemented by one of the most experienced GC experts on earth. He previously designed and implemented WebKit's state of the art concurrent GC, for example. So—yes, but don't dismiss it too quickly.

      83 replies →

    • The author of Fil-C does have some ideas to avoid a garbage collector [1], in summary: Use-after-free at worst means you might see an object of the same size, but you can not corrupt data structures (no pointer / integer confusion). This would be more secure than standard C, but less secure than Fil-C with GC.

      [1] https://x.com/filpizlo/status/1917410045320650839

Can a program be written only partially in Fil-C? That is to say, can we link regular C and Fil+C object files in a single executable?

  • > There is no interoperability with Yolo-C (i.e. classic C). This is both a goal and the outcome of a non goal.

    https://fil-c.org/runtime

    (worth reading, i think all the stuff Fil writes is both super informative & quite entertaining.)

Is there a reason that some of the linked benchmarks, if I'm reading it right, have Fil-C running faster than C?[0] I assume it's just due to micro-benchmark variability but I'm curious. Some of them seem impossibly fast compared to C so I wonder if there are some correctness issue there.

[0] https://cr.yp.to/2025/20251028-filcc-vs-clang.html

  • Usually garbage collection does improve alot of benchmarks, just look at the hans boem gc benchmarks.

    • Back in the day, the cheat was to set up the GC so that the GC happened outside the timed portion of the benchmark. You know what's faster than the fastest GC? Not doing it.

      1 reply →

    • The two extreme outliers I see are labeled "aead/clx192q/opt,-O3" and "aead/schwaemm128128v2/opt,-Os" according to clicking on the points with devtools. aead/schwaemm128128v2/opt,-Os looks like it is almost at 0x. 1x is at about y = 659 and that test is at 769 out of I guess 780 based on the graph.

I’m glad Phil’s work is finally getting the recognition it deserves.

There may be useful takeaways here for Rust’s “unsafe” mode - particularly for applications willing to accept the extra burden of statically linking Fil-C-compiled dependencies. Best of both worlds!

  • > particularly for applications willing to accept the extra burden of statically linking Fil-C-compiled dependencies. Best of both worlds!

    As near as I can tell Fil-C doesn't support this, or any other sort of FFI, at all. Nor am I sure FFI would even make sense, it seems like an approach that has to take over the entire program so that it can track pointer provenance.

    • For securing and maintaining a complex legacy application it seems like a reasonable approach would be to move the majority into Fil-C, then hook the bits that don't fit up via RPC. Maybe some bits get formal verification, rewritten in Rust, ported to new platform APIs, whatever, but at least you get some safety for the whole app without a rewrite.

    • He could add an API to mint a capability out of thin air. It could even be done out of process.

      In fact, I think Fil-C and CHERI could implement 90% the same programmer-level API!

I would really like to see Omarchy go this direction. A fully memory-safe userland for Omarchy is possible with existing techhnology.

  • Can you elaborate why Omarchy? I'm asking, in context of recompiling with Fil-C, because that seems to be just Arch + configurations.

    • For cultural reasons, I would like Omarchy—culturally—to adopt straightforward security as one of their goals, in addition to usability and beauty.

      It's low hanging fruit, and a great way to further differentiate their Linux distribution.

Does Fil-C catch uninitialized memory reads?

  • malloc'd memory is zeroed in fil-c:

    > *zgc_alloc*

    > Allocate count bytes of zero-initialized memory. May allocate slightly more than count, based on the runtime's minalign (which is currently 16).

    > This is a GC allocation, so freeing it is optional. Also, if you free it and then use it, your program is guaranteed to panic.

    > libc's malloc just forwards to this. There is no difference between calling malloc and zgc_alloc.

    from https://fil-c.org/stdfil

Great to see some 3letter guy into this. This might be one of those rando things which gets posted on HN (and which doesn't involve me in the slightest), but a decade later is taking over the world. Rust and Go were like that.

Previously there was that Rust in APT discussion. A lot of this middle-aged linux infrastructure stuff is considered feature-complete and "done". Not many young people are coming in, so you either attract them with "heyy rewrite in rust" or maybe the best thing is to bottle it up and run in a VM.

I am a bit surprised that the build_all_fast_glibc.sh script requires 31Gbyte of memory to run. Can somebody explain? I would like to try out Fil-C.

I can't wait for all the delicious four-way flamewars. Choose your fighter!

1) Rewrite X in Rust

2) Recompile X using Fil-C

3) Recompile X for WASM

4) Safety is for babies

There are a lot of half baked Rust rewrites whose existence was justified on safety grounds and whose rationale is threatened now that HN has heard of Fil-C

  • Fil-C has come up on HN plenty of times before. If it was going to make much of a dent in the discussions, it would have by now.

    • It's strange how ideas seem to explode at random into the discourse despite being known for a long time. It's as if some critical mass stumbles on a thing and it becomes "the current thing" everyone talks about until the next current thing.

  • It's not an either-or (well, except for this last item).

    It seems sensible to not write new software in plain C. Rust is certainly a valid choice for a safer language, but in many cases overkill wrt how painful the rewrite is vs benefits gained from avoiding a higher-level memory-safe one like OCaml.

    At the same time, "let's just rewrite everything!" is also madness. We have many battle-tested libraries written in C already. Something like Fil-C is badly needed to keep them working while improving safety.

    And as for wasm, it's sort of orthogonal - whether you're writing in C or in Rust, the software may be bug-free, but sandboxing it may still be desirable e.g. as a matter of trust (or lack thereof). Also, cross-platform binaries would be nice to have in general.

    • > the software may be bug-free, but sandboxing it may still be desirable e.g. as a matter of trust (or lack thereof)

      Wouldn't the only cause of mistrust be bugs, or am I missing something? If the program is malicious, sandboxing isn't the pertinent action.

      2 replies →

Wish we were talking about making Fil-C required for apt, not Rust...

  • Those seems to be independent issues. Fil-C is about the best way to compile/run C code.

    Rust would be about what language to use for new code.

    Now that I have been programming in Rust for a couple of years, I don't want to go back to C (except for some hobby projects).

    • I agree. The main advantage of Fil-C is compatibility with C, in a secure way. The disadvantages are speed, and garbage collection. (Even thought, I read that garbage collection might not be needed in some cases; I would be very interested in knowing more details).

      For new code, I would not use Fil-C. For kernel and low-level tools, other languages seem better. Right now, Rust is the only popular language in this space that doesn't have these disadvantages. But in my view, Rust also has issues, specially the borrow checker, and code verbosity. Maybe in the future there will be a language that resolves these issues as well (as a hobby, I'm trying to build such a language). But right now, Rust seems to be the best choice for the kernel (for code that needs to be fast and secure).

      4 replies →

  • Fil-C is slow.

    There is no C or C++ memory safe compiler with acceptable performance for kernels, rendering, games, etc. For that you need Rust.

    The future includes Fil-C for legacy code that isn’t performance sensitive and Rust for new code that is.

  • I wish, we will have something like Fil-C as an option for unsafe Rust.

    • Fil-C works because you recompile the whole C userspace. Unsafe Rust doesn't do that... and for many practical purposes you probably want to touch the non-safe-version of the C userspace.

      Still, it's all LLVM, so perhaps unsafe Rust for Fil-space can be a thing, a useful one for catching (what would be) UBs even [Fil-C defines everything, so no UBs, but I'm assuming you want to eventually run it outside of Fil-space].

      Now I actually wonder if Fil-C has an escape hatch somewhere for syscalls that it does not understand etc. Well it doesn't do inline assembly, so I shouldn't expect much... I wonder how far one needs to extend the asm clobber syntax for it to remotely come close to working.

      1 reply →

    • Unsafe Rust actually has a great runtime analyzer: Miri. It's very easy to just run `cargo +nightly miri test` in your project to get some confidence in the more questionable choices along the way.

Building tools is one thing, building a system like Postgres or Databases is going to be another thing.

Anyone really tried building PG or MySQL or such a complex system which heavily relies on IO operations and multi threading capabilities

  • Look at how fanatic the compatibility actually is. Building Postgres or MySQL is conceivable but probably will require some changes. (SQLite compiles and runs with zero changes right now.)

djb uses a surprisingly low amount of RAM (12GB) considering my laptop already has 64G which is possible to expand to 128G in the future