Comment by kragen

11 days ago

Hmm, Fil-C seems potentially really important; there's a lot of software that only exists in the form of C code which it's important to preserve access to, even if the tradeoffs made by conventional C compilers (accepting large risks of security problems in exchange for a small improvement in single-core performance) have largely become obsolete.

The list of supported software is astounding: CPython, SQLite, OpenSSH, ICU, CMake, Perl5, and Bash, for example. There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU? They're making all the right noises about capability security and nonblocking synchronization and whatnot.

Does anyone have experience using it in practice? I see that https://news.ycombinator.com/item?id=45134852 reports a 4× slowdown or better.

The name is hilarious. Feelthay! Feelthay!

97 comments

kragen

pizlonator 11 days ago

> I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU?

You could. That said, FUGC’s guts rely on OS features that in turn rely on an MMU.

But you could make a version of FUGC that has no such dependency.

As for perf - 4x is the worst case and that number is out there because I reported it. And I report worst case perf because that’s how obsessive I am about realistically measuring, and then fanatically resolving, perf issues

Fact is, I can live on the Fil-C versions of a lot of my favorite software and not tell the difference

foldr 11 days ago
> As for perf - 4x is the worst case and that number is out there because I reported it
I love the concept of Fil-C but I find that with the latest release, a Fil-C build of QuickJS executes bytecode around 30x slower than a regular build. Admittedly this is an informal benchmark running on a GitHub CI runner. I’m not sure if virtualization introduces overheads that Fil-C might be particularly sensitive to (?). But I’ve sadly yet to see anything close to a 4x performance difference. Perhaps I will try running the same benchmark on native non-virtualized x86 later today.
Also, so I am not just whining, my Fil-C patch to the QuickJS main branch contains a fix for an issue that’s only triggered by regex backtracking, and which I think you might have missed in your own QuickJS patch:
http://github.com/addrummond/jsockd/blob/main/fil-c-quickjs....
- pizlonator 11 days ago
  
  30x? Oof
  I know that I regressed quickjs recently when I fixed handling of unions. It’s a fixable issue, I just haven’t gone back and fixed it yet.
  I definitely don’t see 30x overhead on anything else I run.
  But thanks for pointing that out, I should probably actually fix the union handling the right way.
  (What’s happening is every time quickjs bit casts doubles to pointers, that’s currently doing a heap allocation. And it’s obviously not needed. The simplest compiler analysis would kill it. I just turned off the previous instance of that analysis because it had a soundness issue)
  
  2 replies →
- kragen 11 days ago
  
  I look forward to seeing how this shakes out. Fanatically, I hope?
modeless 11 days ago
A Fil-C kernel that ran the whole system in the same address space, safely, would sure be something. Getting rid of the overhead of hardware isolation could compensate for some of the overhead of the software safety checks. That was the dream of Microsoft's Singularity project back in the day.
I guess there would be no way to verify that precompiled user programs actually enforce the security boundaries. The only way to guarantee safety in such a system would be to compile everything from source yourself.
- miki123211 11 days ago
  
  This is what IBM I[1] (AKA AS400) does I think.
  Ibm I applications are compiled to a hardware-independent intermediate representation called TIMI, which the SLIC (kernel) can then compile down to machine code, usually at program installation time. As the SLIC is also responsible for maintaining system security, there's no way for a malicious user to sneak in a noncompliant program.
  [1] https://en.wikipedia.org/wiki/IBM_i
  
  8 replies →
- pizlonator 11 days ago
  
  You could have enforcement that binaries use Fil-C rules suing proof carrying code
  
  12 replies →
torginus 11 days ago
How would you go about writing a program/function that runs as close to native speed as possible on Fil-C?
How much more memory do GC programs tend to use?
Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)?
You mention 'accurate' by which I assume you use the compiler to keep track of where the pointers are (via types/stackmaps).
How do you deal with pointers that get cast to ints, and then back?
- pizlonator 11 days ago
  
  > How would you go about writing a program/function that runs as close to native speed as possible on Fil-C?
  Avoid pointer chasing. Use SIMD.
  > How much more memory do GC programs tend to use?
  I would estimate 2x
  Fil-C has additional overheads not related to GC, so maybe it’s higher. I haven’t started measuring and optimizing memory use in anger.
  > Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)?
  See https://fil-c.org/invisicaps
willvarfar 11 days ago
When you run the Fil-C versions of your favourite software, does it have a sanitizer mode that reports bugs like missing free() etc? And have you found any bugs this way?
- pizlonator 11 days ago
  
  Well missing free is just swallowed by the GC - the leak gets fixed without any message.
  I have found bugs in the software that I’ve ported, yeah.
  
  25 replies →
kragen 11 days ago
Yeah, I meant to be clear that 4× was the worst case, and I think it's an impressive achievement already, and perfectly fine for almost everything. After all, running single-threaded software on an 8-core CPU is already an 8× slowdown, right? And people do that all the time!
What's the minimal code size overhead for FUGC?
- pizlonator 11 days ago
  
  > What's the minimal code size overhead for FUGC?
  I don’t have good data on this.
  The FUGC requires the compiler to emit extra metadata and that metadata is hilariously inefficient right now. I haven’t bothered to optimize it. And the FUGC implementation pulls in all of libpas even though it almost certainly doesn’t have to.
  So I don’t know what minimal looks like right now
  
  2 replies →

scottlamb 11 days ago

> The list of supported software is astounding: CPython, SQLite, OpenSSH, ICU, CMake, Perl5, and Bash, for example. There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

Interestingly, I agree with your point in general that there's a lot of software that Fil-C might be a good fit for, but I hesitate to say that about any of the examples you listed:

* CPython and Perl5 are the runtimes for notoriously slow GCed languages, and adding the overhead of a second GC seems...inelegant at best, and likely to slow things down a fair bit more.

* Some of them do have reimplementations or viable alternatives in Rust (or Go or the like) underway, like Turso for SQLite.

* More generally, I'd call these foundational, widely-used, actively maintained pieces of software, so it seems plausible to me that they will decide to RiiR.

I think the best fit may be for stuff that's less actively maintained and less performance-critical. There's 50 years of C programs that people still dig out of the attic sometime but aren't putting that much investment into and are running on hardware vastly more powerful than these programs were written for.

kragen 10 days ago

Yeah, for that reason perhaps Perl5 is a better example than CPython, but something less widely used might be a better example. tcsh, say.

yndoendo 11 days ago

Note the power of SQLite being written in C is the portability to non standard OSes. [0] I've used on an embedded real-time μC/OS-II variant. [1]

Architecture of embedded solutions is different than desktop and server. Example, to prevent memory from fragmenting and high performance, do not free it. Mark that memory (object / struct) as reusable. It is similar to customized heap allocation or pooling.

[0] https://sqlite.org/vfs.html [1] https://en.wikipedia.org/wiki/Micro-Controller_Operating_Sys...

kragen 11 days ago

Probably worth pointing out that Fil-C doesn't yet support 32-bit systems (or presumably 16-bit or 8-bit systems): https://fil-c.org/invisicaps

justin66 11 days ago

> There are a lot of things in that list that nobody is likely to ever rewrite in Rust.

How many years away are we from having AI-enhanced static analysis tools that can accurately look at our C code (after the fact or while we're writing it) and say "this will cause problems, here's a fix" with a level of accuracy sufficient that we can just continue using C?

kragen 10 days ago

I don't think anybody can predict that.

ajb 11 days ago

"I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU?"

Even if it worked for normal data flow, that's the sort of thing that's bound to introduce covert channels, I'd have thought. To start with I guess you have immediately disabled the mitigations of meltdown/spectre, because doesn't that happen when you switch processes?

kragen 11 days ago
Yes, it definitely will not work to plug covert channels or side-channel attacks like Spectre. Typically, computers without MMUs also don't have speculative execution, or in most cases even caches, so Spectre specifically wouldn't be relevant, but lots of other timing side channels would. Maybe other side channels like EMI and power consumption as well.
But consider, for example, decoding JPEG, or maybe some future successor to JPEG, JEEG, by the Joint Evil Experts Group. You want to look at a ransom note that someone in the JEEG has sent you in JEEG format so that you know how much Bitcoin to send them. You have a JEEG decoder, but it was written by Evil Experts, so it might have vulnerabilities, as JPEG implementations have in the past, and maybe the ransom note JEEG is designed to overflow a buffer in it and install a rootkit. Maybe the decoder itself is full of malicious code just waiting for the signal to strike!
If you can run the JEEG decoder in a container that keeps it from accessing the network, writing to the filesystem, launching processes, executing forever, allocating all your RAM, etc., only being permitted to output an uncompressed image, even if you let it read the clock, it probably doesn't matter if it launches some kind of side-channel attack against your Bitcoin wallet and your Bitchat client, because all it can do is put the information it stole into the image you are going to look at and then discard.
You can contrive situations where it can still trick you into leaking bits it stole back to the JEEG (maybe the least significant bits of the ransom amount) but it's an enormous improvement over the usual situation.
Then, FalseType fonts...
- ajb 11 days ago
  
  Well, they may not have speculative execution,but some of them do have branch prediction these days; which probably leaks a certain amount of information. Eg, the cortex M7 (no mmu,mpu optional, has branch prediction)
  
  2 replies →

CuriouslyC 11 days ago

With improvements in coding agents, rewriting code in rust is pretty damn easy, and with a battle tested reference implementation, it should be easy to make something solid. I wouldn't be surprised if we have full rewrites of everything in rust in the next few years, just because it'll be so easy.

kragen 11 days ago
I have had better experiences with LLMs translating code from one language to another than writing code from scratch, but I don't think the current state of LLMs makes it "pretty damn easy" to rewrite code in Rust, especially starting from garbage-collected languages like Perl or Lua.
Certainly it's plausible that in the next few years it'll be pretty damn easy, but with the rapid and unpredictable development of AI, it's also plausible that humanity will be extinct or that all current programming languages will be abandoned.
- CuriouslyC 11 days ago
  
  In the last day I've rewritten two service hot cores in rust using agents, and gotten speedups from 4x to >400x (simd+precise memory management) and gotten full equivalent test coverage basically out of the gates from agent rewrites. So I'd say my experience has been overwhelmingly positive, and while I might be ahead of the curve in terms of AI engineering ability, this capability will come to everyone soon with better models/tools.
  
  4 replies →
pizlonator 11 days ago
I don’t buy it but let’s say that in the best case this happens.
Then we’ll have a continuation of the memory safety exploit dumpster fire because these Rust ports tend to use a significant amount of unsafe code.
On the other hand, Fil-C has no unsafe escape hatches.
Think of Fil-C as the more secure but slower/heavier alternative to Rust
- kragen 11 days ago
  
  Hmm, maybe this should be on the project's homepage: recompiling with Fil-C is a more secure but slower and more-memory-consuming alternative to rewriting in Rust.
  
  2 replies →
- CuriouslyC 11 days ago
  
  By default you are right. However you can use static analysis and tooling guardrails to reject certain classes of unsafe code automatically, and force the agent to go back to the drawing board. It might take a few tries and a tiny amount of massaging but I don't doubt it'd get there.
  
  2 replies →
- baranul 5 days ago
  
  Seems like this kind of talk might scare certain invested people.

odie5533 11 days ago

SQLite in Rust https://github.com/tursodatabase/turso

CPython in Rust https://github.com/RustPython/RustPython

Bash in Rust https://github.com/shellgei/rusty_bash

kragen 11 days ago
Turso says:
> Warning: This software is ALPHA, only use for development, testing, and experimentation. We are working to make it production ready, but do not use it for critical data right now.
https://rustpython.github.io/pages/whats-left says:
> RustPython currently supports the full Python syntax. This is “what’s left” from the Python Standard Library.
Rusty_bash says:
> Currently, the binary built from alpha repo has passed 24 of 84 test scripts.
The CPython implementation is farther along than I had expected! I hope they make more progress.
- Sammi 11 days ago
  
  You're getting downvoted because nobody likes pedantry.
  Especially for the Turso project if you look under "Insights -> Contributors" on their Github page, then it's clear that that project is under heavy active development, and they have an actual funded business startup that want's to sell access to a cloud version of Turso, so they are definitely incentivized to complete it.
  Sqlite was built by three people, and has a stable and well defined interface and file format. This seems like an actual tractable project to re-implement if you have enough man years of funding and a talented enough dev team. Turso seems like they could fit the bill.
  
  6 replies →