Comment by woodruffw

2 years ago

> So, if you think that 200x is meaningful, then it's because you don't know how language/compiler development works, you haven't read my manifesto, and you have no idea where the 200x is coming from (hint: almost all optimizations are turned off for now so I have a reliable compiler to grow a corpus with).

I actually did, the first day you made it public. A friend also sent it to me because you link my blog in it. Again, I think it's cool, and I'm going to keep following your progress, because I think Rust alone is not a panacea.

I've worked on and in LLVM for about 5 years now (and I've contributed to a handful of programming languages and runtimes over the past decade), so I feel comfortable saying that I know a bit about how compilers and language development work. Not enough to say that I'm an infallible expert, but enough to know that it's very hard to claw back performance when doing the kinds of things you're doing (isoheaps, caps). Isotyped heaps, in particular, are a huge pessimization on top of ordinary heap allocation, especially when you get into codebases with more than a few hundred unique types[1].

To be clear: I don't think performance is a sufficient reason to not do memory safety. I've previously advocated for people running sanitizer-instrumented binaries in production, because the performance hit is often acceptable. But again: Rust gets you both performance and safety, and is increasingly the choice for shops that are looking to migrate off of their legacy codebases anyways. It's also easier to justify training a junior engineer to write safe code that can be integrated into a pre-existing codebase.

> You don't have to rewrite your code to use Fil-C.

If I read correctly, you provide an example of an enum below that needs to be rewritten for Fil-C. That's probably an acceptable tradeoff in many codebases, but it sounds like there are well-formed C programs that Fil-C currently rejects.

[1]: https://security.apple.com/blog/towards-the-next-generation-...

> I've worked on and in LLVM for about 5 years now (and I've contributed to a handful of programming languages and runtimes over the past decade), so I feel comfortable saying that I know a bit about how compilers and language development work. Not enough to say that I'm an infallible expert, but enough to know that it's very hard to claw back performance when doing the kinds of things you're doing (isoheaps, caps). Isotyped heaps, in particular, are a huge pessimization on top of ordinary heap allocation, especially when you get into codebases with more than a few hundred unique types[1].

Isoheaps suck a lot more in kernel than they do in user. I don't think it's accurate to say that isoheaps are a "huge pessimization". It's not huge, that's for sure.

For sure, right now, memory usage of Fil-C is just not an issue. The cost of isoheaps is not an issue.

Also, Fil-C is engineered to allow GC, and I haven't made the switch because there are some good reasons not to do it. That's an example of something where I want to pick based on data. I'll pick GC or not depending on what performs better and is most ergonomic for folks, and that's the kind of choice best made after I have a massive corpus.

> If I read correctly, you provide an example of an enum below that needs to be rewritten for Fil-C. That's probably an acceptable tradeoff in many codebases, but it sounds like there are well-formed C programs that Fil-C currently rejects.

Yeah but it's not a rewrite.

If you want to switch to Rust, it's not a matter of changing a union - it's changing everything.

If you want to switch to Fil-C, then yeah, some of your unions, and most of your mallocs, will change.

For example, it took about two-three weeks working about 2hrs/day to convert OpenSSH to the point where the client works. I don't think you'd be able to rewrite OpenSSH in Rust on that kind of schedule.

  • Hi pizlonator, I'm working on a solution with similar goals (I think), but a bit of a different approach. It's a tool that auto-translates[1] (reasonable) C code to a memory-safe subset of C++. The goal is to get it reliable enough that it can be simply inserted as an (optional) build step, so that the source code can be maintained in its original form.

    I'm under the impression that you're more of a low-level/compiler person, but I suggest that a higher level language like (a memory-safe subset of) C++ actually makes for a more desirable "intermediate representation" language, as it's amenable to maintaining information about the "intent" of the code, which can be helpful for optimization. It also allows programmers to provide manually optimized memory-safe implementations for performance-critical parts of the code.

    The memory-safe subset of C++ is somewhat analogous to Rust's in terms of performance and in that it depends on a non-trivial static checker, but it imposes less onerous restrictions than Rust on single-threaded code.

    The auto-translation tool already does the non-trivial (optimization) task of determining whether any (raw) pointer is being used as an array iterator or not. But further work to make the resulting code more performance optimal is needed. The task of optimizing a high-level "intermediate representation" language like (memory-safe) C++ is roughly analogous to optimizing lower-level IR languages, but the results should be more effective because you have more information about the original code, right?

    I think this project could greatly benefit from the kind of effort you've displayed in yours.

    [1]: https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...

    • That's cool!

      My plan for Fil-C is to introduce stricter types as an optionally available thing while preserving the property that it's fast to convert C code to Fil-C.

      C++ is easiest to describe, at the guts, in terms of C-style reasoning about pointers. So, the easiest path to convincingly make C++ safe is to convincingly make C safe first, and then implement the C++ stuff around that. It works out that way in the guts of clang/llvm, since my missing C++ support is largely about (a) some missing jank and glue in the frontend that isn't even that interesting and (b) missing llvm IR ops in the FilPizlonatorPass.

      1 reply →