← Back to context

Comment by kragen

11 days ago

Hmm, Fil-C seems potentially really important; there's a lot of software that only exists in the form of C code which it's important to preserve access to, even if the tradeoffs made by conventional C compilers (accepting large risks of security problems in exchange for a small improvement in single-core performance) have largely become obsolete.

The list of supported software is astounding: CPython, SQLite, OpenSSH, ICU, CMake, Perl5, and Bash, for example. There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU? They're making all the right noises about capability security and nonblocking synchronization and whatnot.

Does anyone have experience using it in practice? I see that https://news.ycombinator.com/item?id=45134852 reports a 4× slowdown or better.

The name is hilarious. Feelthay! Feelthay!

> I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU?

You could. That said, FUGC’s guts rely on OS features that in turn rely on an MMU.

But you could make a version of FUGC that has no such dependency.

As for perf - 4x is the worst case and that number is out there because I reported it. And I report worst case perf because that’s how obsessive I am about realistically measuring, and then fanatically resolving, perf issues

Fact is, I can live on the Fil-C versions of a lot of my favorite software and not tell the difference

  • > As for perf - 4x is the worst case and that number is out there because I reported it

    I love the concept of Fil-C but I find that with the latest release, a Fil-C build of QuickJS executes bytecode around 30x slower than a regular build. Admittedly this is an informal benchmark running on a GitHub CI runner. I’m not sure if virtualization introduces overheads that Fil-C might be particularly sensitive to (?). But I’ve sadly yet to see anything close to a 4x performance difference. Perhaps I will try running the same benchmark on native non-virtualized x86 later today.

    Also, so I am not just whining, my Fil-C patch to the QuickJS main branch contains a fix for an issue that’s only triggered by regex backtracking, and which I think you might have missed in your own QuickJS patch:

    http://github.com/addrummond/jsockd/blob/main/fil-c-quickjs....

    • 30x? Oof

      I know that I regressed quickjs recently when I fixed handling of unions. It’s a fixable issue, I just haven’t gone back and fixed it yet.

      I definitely don’t see 30x overhead on anything else I run.

      But thanks for pointing that out, I should probably actually fix the union handling the right way.

      (What’s happening is every time quickjs bit casts doubles to pointers, that’s currently doing a heap allocation. And it’s obviously not needed. The simplest compiler analysis would kill it. I just turned off the previous instance of that analysis because it had a soundness issue)

      2 replies →

  • A Fil-C kernel that ran the whole system in the same address space, safely, would sure be something. Getting rid of the overhead of hardware isolation could compensate for some of the overhead of the software safety checks. That was the dream of Microsoft's Singularity project back in the day.

    I guess there would be no way to verify that precompiled user programs actually enforce the security boundaries. The only way to guarantee safety in such a system would be to compile everything from source yourself.

    • This is what IBM I[1] (AKA AS400) does I think.

      Ibm I applications are compiled to a hardware-independent intermediate representation called TIMI, which the SLIC (kernel) can then compile down to machine code, usually at program installation time. As the SLIC is also responsible for maintaining system security, there's no way for a malicious user to sneak in a noncompliant program.

      [1] https://en.wikipedia.org/wiki/IBM_i

      8 replies →

  • How would you go about writing a program/function that runs as close to native speed as possible on Fil-C?

    How much more memory do GC programs tend to use?

    Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)?

    You mention 'accurate' by which I assume you use the compiler to keep track of where the pointers are (via types/stackmaps).

    How do you deal with pointers that get cast to ints, and then back?

    • > How would you go about writing a program/function that runs as close to native speed as possible on Fil-C?

      Avoid pointer chasing. Use SIMD.

      > How much more memory do GC programs tend to use?

      I would estimate 2x

      Fil-C has additional overheads not related to GC, so maybe it’s higher. I haven’t started measuring and optimizing memory use in anger.

      > Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)?

      See https://fil-c.org/invisicaps

  • When you run the Fil-C versions of your favourite software, does it have a sanitizer mode that reports bugs like missing free() etc? And have you found any bugs this way?

  • Yeah, I meant to be clear that 4× was the worst case, and I think it's an impressive achievement already, and perfectly fine for almost everything. After all, running single-threaded software on an 8-core CPU is already an 8× slowdown, right? And people do that all the time!

    What's the minimal code size overhead for FUGC?

    • > What's the minimal code size overhead for FUGC?

      I don’t have good data on this.

      The FUGC requires the compiler to emit extra metadata and that metadata is hilariously inefficient right now. I haven’t bothered to optimize it. And the FUGC implementation pulls in all of libpas even though it almost certainly doesn’t have to.

      So I don’t know what minimal looks like right now

      2 replies →

> The list of supported software is astounding: CPython, SQLite, OpenSSH, ICU, CMake, Perl5, and Bash, for example. There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

Interestingly, I agree with your point in general that there's a lot of software that Fil-C might be a good fit for, but I hesitate to say that about any of the examples you listed:

* CPython and Perl5 are the runtimes for notoriously slow GCed languages, and adding the overhead of a second GC seems...inelegant at best, and likely to slow things down a fair bit more.

* Some of them do have reimplementations or viable alternatives in Rust (or Go or the like) underway, like Turso for SQLite.

* More generally, I'd call these foundational, widely-used, actively maintained pieces of software, so it seems plausible to me that they will decide to RiiR.

I think the best fit may be for stuff that's less actively maintained and less performance-critical. There's 50 years of C programs that people still dig out of the attic sometime but aren't putting that much investment into and are running on hardware vastly more powerful than these programs were written for.

  • Yeah, for that reason perhaps Perl5 is a better example than CPython, but something less widely used might be a better example. tcsh, say.

Note the power of SQLite being written in C is the portability to non standard OSes. [0] I've used on an embedded real-time μC/OS-II variant. [1]

Architecture of embedded solutions is different than desktop and server. Example, to prevent memory from fragmenting and high performance, do not free it. Mark that memory (object / struct) as reusable. It is similar to customized heap allocation or pooling.

[0] https://sqlite.org/vfs.html [1] https://en.wikipedia.org/wiki/Micro-Controller_Operating_Sys...

> There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

How many years away are we from having AI-enhanced static analysis tools that can accurately look at our C code (after the fact or while we're writing it) and say "this will cause problems, here's a fix" with a level of accuracy sufficient that we can just continue using C?

"I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU?"

Even if it worked for normal data flow, that's the sort of thing that's bound to introduce covert channels, I'd have thought. To start with I guess you have immediately disabled the mitigations of meltdown/spectre, because doesn't that happen when you switch processes?

  • Yes, it definitely will not work to plug covert channels or side-channel attacks like Spectre. Typically, computers without MMUs also don't have speculative execution, or in most cases even caches, so Spectre specifically wouldn't be relevant, but lots of other timing side channels would. Maybe other side channels like EMI and power consumption as well.

    But consider, for example, decoding JPEG, or maybe some future successor to JPEG, JEEG, by the Joint Evil Experts Group. You want to look at a ransom note that someone in the JEEG has sent you in JEEG format so that you know how much Bitcoin to send them. You have a JEEG decoder, but it was written by Evil Experts, so it might have vulnerabilities, as JPEG implementations have in the past, and maybe the ransom note JEEG is designed to overflow a buffer in it and install a rootkit. Maybe the decoder itself is full of malicious code just waiting for the signal to strike!

    If you can run the JEEG decoder in a container that keeps it from accessing the network, writing to the filesystem, launching processes, executing forever, allocating all your RAM, etc., only being permitted to output an uncompressed image, even if you let it read the clock, it probably doesn't matter if it launches some kind of side-channel attack against your Bitcoin wallet and your Bitchat client, because all it can do is put the information it stole into the image you are going to look at and then discard.

    You can contrive situations where it can still trick you into leaking bits it stole back to the JEEG (maybe the least significant bits of the ransom amount) but it's an enormous improvement over the usual situation.

    Then, FalseType fonts...

    • Well, they may not have speculative execution,but some of them do have branch prediction these days; which probably leaks a certain amount of information. Eg, the cortex M7 (no mmu,mpu optional, has branch prediction)

      2 replies →

With improvements in coding agents, rewriting code in rust is pretty damn easy, and with a battle tested reference implementation, it should be easy to make something solid. I wouldn't be surprised if we have full rewrites of everything in rust in the next few years, just because it'll be so easy.

  • I have had better experiences with LLMs translating code from one language to another than writing code from scratch, but I don't think the current state of LLMs makes it "pretty damn easy" to rewrite code in Rust, especially starting from garbage-collected languages like Perl or Lua.

    Certainly it's plausible that in the next few years it'll be pretty damn easy, but with the rapid and unpredictable development of AI, it's also plausible that humanity will be extinct or that all current programming languages will be abandoned.

    • In the last day I've rewritten two service hot cores in rust using agents, and gotten speedups from 4x to >400x (simd+precise memory management) and gotten full equivalent test coverage basically out of the gates from agent rewrites. So I'd say my experience has been overwhelmingly positive, and while I might be ahead of the curve in terms of AI engineering ability, this capability will come to everyone soon with better models/tools.

      4 replies →

  • I don’t buy it but let’s say that in the best case this happens.

    Then we’ll have a continuation of the memory safety exploit dumpster fire because these Rust ports tend to use a significant amount of unsafe code.

    On the other hand, Fil-C has no unsafe escape hatches.

    Think of Fil-C as the more secure but slower/heavier alternative to Rust

    • Hmm, maybe this should be on the project's homepage: recompiling with Fil-C is a more secure but slower and more-memory-consuming alternative to rewriting in Rust.

      2 replies →

    • By default you are right. However you can use static analysis and tooling guardrails to reject certain classes of unsafe code automatically, and force the agent to go back to the drawing board. It might take a few tries and a tiny amount of massaging but I don't doubt it'd get there.

      2 replies →

SQLite in Rust https://github.com/tursodatabase/turso

CPython in Rust https://github.com/RustPython/RustPython

Bash in Rust https://github.com/shellgei/rusty_bash

  • Turso says:

    > Warning: This software is ALPHA, only use for development, testing, and experimentation. We are working to make it production ready, but do not use it for critical data right now.

    https://rustpython.github.io/pages/whats-left says:

    > RustPython currently supports the full Python syntax. This is “what’s left” from the Python Standard Library.

    Rusty_bash says:

    > Currently, the binary built from alpha repo has passed 24 of 84 test scripts.

    The CPython implementation is farther along than I had expected! I hope they make more progress.

    • You're getting downvoted because nobody likes pedantry.

      Especially for the Turso project if you look under "Insights -> Contributors" on their Github page, then it's clear that that project is under heavy active development, and they have an actual funded business startup that want's to sell access to a cloud version of Turso, so they are definitely incentivized to complete it.

      Sqlite was built by three people, and has a stable and well defined interface and file format. This seems like an actual tractable project to re-implement if you have enough man years of funding and a talented enough dev team. Turso seems like they could fit the bill.

      6 replies →