Comment by safercplusplus
2 years ago
Hi pizlonator, I'm working on a solution with similar goals (I think), but a bit of a different approach. It's a tool that auto-translates[1] (reasonable) C code to a memory-safe subset of C++. The goal is to get it reliable enough that it can be simply inserted as an (optional) build step, so that the source code can be maintained in its original form.
I'm under the impression that you're more of a low-level/compiler person, but I suggest that a higher level language like (a memory-safe subset of) C++ actually makes for a more desirable "intermediate representation" language, as it's amenable to maintaining information about the "intent" of the code, which can be helpful for optimization. It also allows programmers to provide manually optimized memory-safe implementations for performance-critical parts of the code.
The memory-safe subset of C++ is somewhat analogous to Rust's in terms of performance and in that it depends on a non-trivial static checker, but it imposes less onerous restrictions than Rust on single-threaded code.
The auto-translation tool already does the non-trivial (optimization) task of determining whether any (raw) pointer is being used as an array iterator or not. But further work to make the resulting code more performance optimal is needed. The task of optimizing a high-level "intermediate representation" language like (memory-safe) C++ is roughly analogous to optimizing lower-level IR languages, but the results should be more effective because you have more information about the original code, right?
I think this project could greatly benefit from the kind of effort you've displayed in yours.
[1]: https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...
That's cool!
My plan for Fil-C is to introduce stricter types as an optionally available thing while preserving the property that it's fast to convert C code to Fil-C.
C++ is easiest to describe, at the guts, in terms of C-style reasoning about pointers. So, the easiest path to convincingly make C++ safe is to convincingly make C safe first, and then implement the C++ stuff around that. It works out that way in the guts of clang/llvm, since my missing C++ support is largely about (a) some missing jank and glue in the frontend that isn't even that interesting and (b) missing llvm IR ops in the FilPizlonatorPass.
> the easiest path to convincingly make C++ safe is to convincingly make C safe first
Yeah, with all the static analysis, I did end up straying from the easy path. Ugh :) But actually, one thing that C++ provides that I found made things easier is destructors. I mean, I provide a couple of raw pointer replacement types that rely on ("transparently wrapped") target objects checking for any (replacement) pointers still targeting them when they get destroyed.
As you indicated in another comment, you explicitly choose to expose/require zalloc() because you didn't want to make malloc() too "magical" (by hiding the indirect type deduction). In that vein, one maybe nice thing about the "safe C++ subset" solution is that it exposes the entirety of the run-time safety mechanisms, in the sense that it's all in the library code and you can even step through it in the debugger. (It also gives you the option to catch any exceptions thrown by said safety mechanisms. You know, if exceptions are your thing. Otherwise you can provide your own custom "fault handling" code (if you want to log the error, or dump the stack or whatever).)
> There's a ton of literature on ways to make C/C++ safe. I think that the only reason why that path isn't being explored more is that it's the "less fun" option - it doesn't involve blue sky thoughts about new hardware or new languages.
I can't think of any other reason that makes sense either. Anyway, the first thing is to dispel the notion that C and C++ cannot be safe, and it seems like your project is likely to be the first to demonstrate it on some staple C libraries. I'm looking forward to it.