Comment by pizlonator
2 years ago
The report gets it wrong. C and C++ can both be made memory safe with small changes. The cost of doing that is likely to be lower than the cost of either deploying CHERI or rewriting in Rust. And, the protections are likely to be stronger than what CHERI offers (CHERI tries really hard to just let existing C code do whatever the heck it does).
There's a ton of literature on ways to make C/C++ safe. I think that the only reason why that path isn't being explored more is that it's the "less fun" option - it doesn't involve blue sky thoughts about new hardware or new languages.
I think what you’re doing with Fil-C is cool, but I wouldn’t call a 200x slowdown a “small change.”
One of the interesting things that Rust has demonstrated is that you don’t have to choose between performance and safety and, in fact, that safety improvements in languages can actually result in faster programs (e.g. due to improved alias analysis). New technology/sexiness advantage aside, I think this is a significant driver of adoption.
> I think what you’re doing with Fil-C is cool, but I wouldn’t call a 200x slowdown a “small change.”
If you're bringing up the 200x, then you don't get what's going on.
It's extremely useful right now to have a compiler that's substantially correct so I don't have to deal with miscompiles as I grow the corpus.
Once I have a large enough corpus of tests, then I'll start optimizing. Writing compiler optimizations incrementally on top of a totally reliable compiler is just sensible engineering practice.
So, if you think that 200x is meaningful, then it's because you don't know how language/compiler development works, you haven't read my manifesto, and you have no idea where the 200x is coming from (hint: almost all optimizations are turned off for now so I have a reliable compiler to grow a corpus with).
> One of the interesting things that Rust has demonstrated is that you don’t have to choose between performance and safety and, in fact, that safety improvements in languages can actually result in faster programs (e.g. due to improved alias analysis). New technology/sexiness advantage aside, I think this is a significant driver of adoption.
You have to rewrite your code to use Rust. You don't have to rewrite your code to use Fil-C. So, Rust costs more, period. And it costs more in exactly the kind of way that cannot be fixed. Fil-C's perf can be fixed. The fact that Rust requires rewriting your code cannot be fixed.
We can worry about making Fil-C fast once there's a corpus of stuff that runs on it. Until then, saying speed is a shortcoming of Fil-C is an utterly disingenuous argument. I can't take you seriously if you're making that argument.
> So, if you think that 200x is meaningful, then it's because you don't know how language/compiler development works, you haven't read my manifesto, and you have no idea where the 200x is coming from (hint: almost all optimizations are turned off for now so I have a reliable compiler to grow a corpus with).
I actually did, the first day you made it public. A friend also sent it to me because you link my blog in it. Again, I think it's cool, and I'm going to keep following your progress, because I think Rust alone is not a panacea.
I've worked on and in LLVM for about 5 years now (and I've contributed to a handful of programming languages and runtimes over the past decade), so I feel comfortable saying that I know a bit about how compilers and language development work. Not enough to say that I'm an infallible expert, but enough to know that it's very hard to claw back performance when doing the kinds of things you're doing (isoheaps, caps). Isotyped heaps, in particular, are a huge pessimization on top of ordinary heap allocation, especially when you get into codebases with more than a few hundred unique types[1].
To be clear: I don't think performance is a sufficient reason to not do memory safety. I've previously advocated for people running sanitizer-instrumented binaries in production, because the performance hit is often acceptable. But again: Rust gets you both performance and safety, and is increasingly the choice for shops that are looking to migrate off of their legacy codebases anyways. It's also easier to justify training a junior engineer to write safe code that can be integrated into a pre-existing codebase.
> You don't have to rewrite your code to use Fil-C.
If I read correctly, you provide an example of an enum below that needs to be rewritten for Fil-C. That's probably an acceptable tradeoff in many codebases, but it sounds like there are well-formed C programs that Fil-C currently rejects.
[1]: https://security.apple.com/blog/towards-the-next-generation-...
4 replies →
Do you have a forecast as to what the slowdown will be after optimizations are implemented? 20x? 2x? 1.2x? 1.02x?
Thanks.
1 reply →
What kind of small changes? It seems strange to me that other languages would bother implementing complicated garbage collectors and borrow checkers if all you need is a small change from C.
See here: https://github.com/pizlonator/llvm-project-deluge/blob/delug...
I just got the OpenSSH client to work last night.
Here's an example of the kinds of changes you have to make: https://github.com/pizlonator/deluded-openssh-portable/commi...
Most of the changes are just using zalloc and friends instead of malloc and friends. If I reaaaallly wanted to, I could have made it automatic (like, `malloc(sizeof(Foo))` could be interpreted by the compiler as being just `zalloc(Foo, 1)` ... I didn't do that because I sorta think it's too magical and C programmers don't like too much magic).
How do you handle unions?
I'm also having a hard time fully convincing myself of this:
> the allocator will return a pointer to memory that had always been exactly that type. Use-after-free does not lead to type confusion in Fil-C
In the worst case, this seems like you must simply never reallocate memory, or we're discarding parts of the type. If I successively allocate integer arrays of growing lengths, it seems to be it must either return memory that had previously been used with a different type (e.g., a int[5] and an int[3] occupying the same memory at disjoint times) or address space usage in such a program is quadratic, or we're not considering array length as "part of the type", i.e., we're discarding it. (I'm not sure if this is acceptable or not. I think that should be fine, but I'll have to think harder.)
3 replies →
Attaching capabilities to pointers is sort of what CHERI does, isn't it? And the presumably CHERI can have better performance thanks to the direct hardware support. (Your manifesto mentions a 200x performance impact currently.)
1 reply →