For some context, Ralf is one of the people who's been working on proving Rust code, and specifically unsafe Rust code. He's already had some really impressive results.
One of the weakest areas of Rust (in some sense) is unsafe; we don't have a proper understanding of what exactly is acceptable with unsafe code. Projects like these are working on proposing a model for what those rules should be.
So, if you're most Rust programmers, this doesn't affect you, really, other than making code you depend on a bit higher quality :). If you're an author of unsafe Rust code, this would tell you what you can and can't do, and if you work on a compiler, these rules would tell you what you can and can't optimize.
Additionally, this is pretty neat:
> I hope to explore this question further in the following weeks by implementing a dynamic checker to test the model on real code.
If we accepted this model (which is, of course, a big if; it was just proposed!), then we could take this tool and instrument your unsafe, an "unsafegrind", if you will, that could tell you at runtime if your code trips UB.
This tooling is enabled by MIRI, which is also amazing, but I've gone on too long already. TL;DR: it's sort of a "rust interpreter" that we've invested in for compile-time function evaluation, but also pays off with things like this.
MIRI is short for "MIR Interpreter"; it can run basically any Rust code. But, in the compiler itself, its interface is restricted a bit; we haven't fully defined what "const fn" will be.
Beyond the technical aspect, the pedagogical side of this is huge.
An argument often made in favor of Rust's static analysis restrictions is that "worst case scenario, you just drop down to unsafe and you're back to C semantics".
However, this only holds true as long as the rules to uphold in unsafe code are easy to explain. Strict type aliasing in C is already considered as somewhat of a disaster by many, and is way simpler.
It's fascinating to see the problem being addressed. But it all seems to be about "raw pointers", which are mostly for talking to non-Rust code and probably ought to be on the way out as more low-level libraries are implemented in Rust.
The expressive problems with the ownership model mostly involve backpointers. It's hard to create doubly-linked lists or trees with backpointers in safe Rust, short of going to reference counts and weak pointers. This proposal doesn't seem to address that. Backlinks imply a consistency constraint between forward and back pointers, and Rust lacks the expressive power to talk about that.
The other big expressive issue involves partially initialized arrays. The underlying machinery for a vector involves getting uninitialized memory and initializing it one entry at a time as more of the vector's space is used. You can't talk about that within Rust. It's the sort of thing for which a simple inductive proof is needed, once you're able to say things like "Only elements 0..N of this array are valid".
These are places where you need unsafe Rust because you can't say what you mean in safe Rust. That's the first problem, insufficient expressive power. The second problem is proving that what you said is correct. That's a technical compiler/verifier issue.
A model that just talks about "the first N elements of this allocation are initialized" or even "this contiguous subsequence is initialized" is very limiting. E.g. it doesn't allow for a ring buffer, nor a hashmap that leaves some locations uninit.
I don't see how baking special cases for safety into the compiler is any better than having more general tooling, and constructing safe wrappers in Rust.
Also, just for some context, you're often complaining that Rust is too complicated comparing it to C++. Your desires for extra power to reason about pointer-soups (or writing inductive proofs in the language, rather than using existing tooling for it) aren't going to fix that at all, and indeed are likely to add a whole extra level of complexity.
It doesn't seem like a good use of time, or "language-complexity budget", to have the language features that specifically allow `Vec<T>` to be implemented safely, when that can all just be done in (not-that-complicated) normal Rust code. Additionally, reasoning about code (i.e. feeling satisfied in its correctness) is almost always easier than reasoning about code that reasons about code (i.e. feeling satisfied that the proof-checker/compile will correctly prove other code's correctness).
After we have a formal understanding of exactly which invariants unsafe Rust code may rely on and must preserve, we may be able to integrate a proof assistant into Rust to allow all that functionality and more to be exposed safely—we could replace unsafe {} blocks with automatically checked proof annotations! That’s a dream for the far future, but Ralf’s work is an important step toward making that possible.
Which it has been for too long. I was doing automated program proving on basic validity stuff like this over 30 years ago.
In order to prove something, you first need a notation for talking about it. Few languages let you talk about uninitialized memory space properly. Constructors are too magic.
Un-initialized memory can be viewed as write-only. You want to show that no un-initialized byte is ever read. Memory comes from the allocator in blocks of un-initialized bytes.
A constructor is simply a function that takes a block of un-initialized memory and assigns every byte to something meaningful for that type. At the end of construction, the memory block becomes a valid instance of that type.
Once you can talk about construction as a non-magic operation, you can talk about partial construction of an array. Until you can talk about it, you can't prove anything about it.
Some types, such as "byte", are fully mapped - any bit pattern is valid. So they can be initialized by anything that fills them. That's how input buffers can be expressed.
The lack of expressive power in safe Rust is indeed a problem. However, safe Rust is getting more expressive slowly but surely. Examples are non-lexical lifetimes that allow more flexibility locally and generic associated types that allow new types of generic interfaces (allowing, for example, expressing streaming iterators generically). I'm pretty sure there will be more enhancements in expressive power in the future.
More importantly, the lack of soundness guarantees in unsafe Rust is a more critical problem. This is because unsafe is the primary mechanism to supplement safe Rust when safe Rust isn't expressive enough – it's not just for FFI like you claim! That foundation needs to be rock solid, and this work is addressing that problem.
Btw. backpointers require a guarantee that the values pointed at are not moved as long as the pointers themselves are valid. Before, the compiler would ensure that things borrowed from didn't move, but that isn't enough to implement some systems with longer-living or self-referential pointers. And generally Rust doesn't guarantee that things won't move around. But there is going to be support for "pinning" values. That still doesn't get you all you need for backpointers (which have really complicated semantics if you think about it!), but again, one step of expressivity more that allows you to implement backpointers in unsafe code, but then expose a safe interface where you can get "pinned" references to the nodes.
Backpointers aren't that bad. You have two pointers, locked together by a simple invariant. That invariant has to be valid both before and after any operation on either pointer or deletion of either object. The problem is that in most languages, it's hard to talk about an invariant between two variables. But what you have to check or prove is trivial. You don't need a full theorem prover, just some standard compiler-type checks.
That's what I'm getting at here. There are only a few special cases where you really need unsafe code. Backpointers, partially initialized arrays, and foreign functions cover most of it. Yes, you can construct fancy situations like hash maps where there's some performance benefit in having complicated sparse arrays of pointers interspersed with junk. Is it worth it? Zeroing memory is cheap. Go zeros everything and doesn't seem to suffer.
The use case of Rust is memory safety without garbage collection. Keep that in mind. Try not to get too clever. Too much "unsafe", and it will break. Not just through coding bugs. Because someone will change something that was based on an implicit assumption made in the past by another.
> But it all seems to be about "raw pointers", which are mostly for talking to non-Rust code and probably ought to be on the way out as more low-level libraries are implemented in Rust.
Plenty of pure Rust code uses pointers. They're necessary for the implementation of things like `Vec`. Raw pointers aren't going anywhere.
> The expressive problems with the ownership model mostly involve backpointers.
These problems can be trivially solved in 95% of cases with arena allocation. The other 5% fundamentally require bespoke solutions because you're trying to do something elaborate/niche.
This work is far more foundational than any of that. It applies to references just as much as raw pointers- it determines how much the borrow checker is allowed to permit, for example. You can't even begin to write the sorts of things you're suggesting without this.
For some context, Ralf is one of the people who's been working on proving Rust code, and specifically unsafe Rust code. He's already had some really impressive results.
One of the weakest areas of Rust (in some sense) is unsafe; we don't have a proper understanding of what exactly is acceptable with unsafe code. Projects like these are working on proposing a model for what those rules should be.
So, if you're most Rust programmers, this doesn't affect you, really, other than making code you depend on a bit higher quality :). If you're an author of unsafe Rust code, this would tell you what you can and can't do, and if you work on a compiler, these rules would tell you what you can and can't optimize.
Additionally, this is pretty neat:
> I hope to explore this question further in the following weeks by implementing a dynamic checker to test the model on real code.
If we accepted this model (which is, of course, a big if; it was just proposed!), then we could take this tool and instrument your unsafe, an "unsafegrind", if you will, that could tell you at runtime if your code trips UB.
This tooling is enabled by MIRI, which is also amazing, but I've gone on too long already. TL;DR: it's sort of a "rust interpreter" that we've invested in for compile-time function evaluation, but also pays off with things like this.
Huh, does MIRI mean that rust is actually doing some form of partial evaluation during compilation? What's the best way to learn more about that?
MIRI is short for "MIR Interpreter"; it can run basically any Rust code. But, in the compiler itself, its interface is restricted a bit; we haven't fully defined what "const fn" will be.
https://github.com/solson/miri is probably the best place to start poking around :)
Beyond the technical aspect, the pedagogical side of this is huge.
An argument often made in favor of Rust's static analysis restrictions is that "worst case scenario, you just drop down to unsafe and you're back to C semantics".
However, this only holds true as long as the rules to uphold in unsafe code are easy to explain. Strict type aliasing in C is already considered as somewhat of a disaster by many, and is way simpler.
I hope they get this right.
It's fascinating to see the problem being addressed. But it all seems to be about "raw pointers", which are mostly for talking to non-Rust code and probably ought to be on the way out as more low-level libraries are implemented in Rust.
The expressive problems with the ownership model mostly involve backpointers. It's hard to create doubly-linked lists or trees with backpointers in safe Rust, short of going to reference counts and weak pointers. This proposal doesn't seem to address that. Backlinks imply a consistency constraint between forward and back pointers, and Rust lacks the expressive power to talk about that.
The other big expressive issue involves partially initialized arrays. The underlying machinery for a vector involves getting uninitialized memory and initializing it one entry at a time as more of the vector's space is used. You can't talk about that within Rust. It's the sort of thing for which a simple inductive proof is needed, once you're able to say things like "Only elements 0..N of this array are valid".
These are places where you need unsafe Rust because you can't say what you mean in safe Rust. That's the first problem, insufficient expressive power. The second problem is proving that what you said is correct. That's a technical compiler/verifier issue.
A model that just talks about "the first N elements of this allocation are initialized" or even "this contiguous subsequence is initialized" is very limiting. E.g. it doesn't allow for a ring buffer, nor a hashmap that leaves some locations uninit.
I don't see how baking special cases for safety into the compiler is any better than having more general tooling, and constructing safe wrappers in Rust.
Also, just for some context, you're often complaining that Rust is too complicated comparing it to C++. Your desires for extra power to reason about pointer-soups (or writing inductive proofs in the language, rather than using existing tooling for it) aren't going to fix that at all, and indeed are likely to add a whole extra level of complexity.
It doesn't seem like a good use of time, or "language-complexity budget", to have the language features that specifically allow `Vec<T>` to be implemented safely, when that can all just be done in (not-that-complicated) normal Rust code. Additionally, reasoning about code (i.e. feeling satisfied in its correctness) is almost always easier than reasoning about code that reasons about code (i.e. feeling satisfied that the proof-checker/compile will correctly prove other code's correctness).
After we have a formal understanding of exactly which invariants unsafe Rust code may rely on and must preserve, we may be able to integrate a proof assistant into Rust to allow all that functionality and more to be exposed safely—we could replace unsafe {} blocks with automatically checked proof annotations! That’s a dream for the far future, but Ralf’s work is an important step toward making that possible.
That’s a dream for the far future.
Which it has been for too long. I was doing automated program proving on basic validity stuff like this over 30 years ago.
In order to prove something, you first need a notation for talking about it. Few languages let you talk about uninitialized memory space properly. Constructors are too magic.
Un-initialized memory can be viewed as write-only. You want to show that no un-initialized byte is ever read. Memory comes from the allocator in blocks of un-initialized bytes.
A constructor is simply a function that takes a block of un-initialized memory and assigns every byte to something meaningful for that type. At the end of construction, the memory block becomes a valid instance of that type.
Once you can talk about construction as a non-magic operation, you can talk about partial construction of an array. Until you can talk about it, you can't prove anything about it.
Some types, such as "byte", are fully mapped - any bit pattern is valid. So they can be initialized by anything that fills them. That's how input buffers can be expressed.
4 replies →
The lack of expressive power in safe Rust is indeed a problem. However, safe Rust is getting more expressive slowly but surely. Examples are non-lexical lifetimes that allow more flexibility locally and generic associated types that allow new types of generic interfaces (allowing, for example, expressing streaming iterators generically). I'm pretty sure there will be more enhancements in expressive power in the future.
More importantly, the lack of soundness guarantees in unsafe Rust is a more critical problem. This is because unsafe is the primary mechanism to supplement safe Rust when safe Rust isn't expressive enough – it's not just for FFI like you claim! That foundation needs to be rock solid, and this work is addressing that problem.
Btw. backpointers require a guarantee that the values pointed at are not moved as long as the pointers themselves are valid. Before, the compiler would ensure that things borrowed from didn't move, but that isn't enough to implement some systems with longer-living or self-referential pointers. And generally Rust doesn't guarantee that things won't move around. But there is going to be support for "pinning" values. That still doesn't get you all you need for backpointers (which have really complicated semantics if you think about it!), but again, one step of expressivity more that allows you to implement backpointers in unsafe code, but then expose a safe interface where you can get "pinned" references to the nodes.
Backpointers aren't that bad. You have two pointers, locked together by a simple invariant. That invariant has to be valid both before and after any operation on either pointer or deletion of either object. The problem is that in most languages, it's hard to talk about an invariant between two variables. But what you have to check or prove is trivial. You don't need a full theorem prover, just some standard compiler-type checks.
That's what I'm getting at here. There are only a few special cases where you really need unsafe code. Backpointers, partially initialized arrays, and foreign functions cover most of it. Yes, you can construct fancy situations like hash maps where there's some performance benefit in having complicated sparse arrays of pointers interspersed with junk. Is it worth it? Zeroing memory is cheap. Go zeros everything and doesn't seem to suffer.
The use case of Rust is memory safety without garbage collection. Keep that in mind. Try not to get too clever. Too much "unsafe", and it will break. Not just through coding bugs. Because someone will change something that was based on an implicit assumption made in the past by another.
1 reply →
> But it all seems to be about "raw pointers", which are mostly for talking to non-Rust code and probably ought to be on the way out as more low-level libraries are implemented in Rust.
Plenty of pure Rust code uses pointers. They're necessary for the implementation of things like `Vec`. Raw pointers aren't going anywhere.
> The expressive problems with the ownership model mostly involve backpointers.
These problems can be trivially solved in 95% of cases with arena allocation. The other 5% fundamentally require bespoke solutions because you're trying to do something elaborate/niche.
This work is far more foundational than any of that. It applies to references just as much as raw pointers- it determines how much the borrow checker is allowed to permit, for example. You can't even begin to write the sorts of things you're suggesting without this.
It's linked (with a broken link if the author's reading this) to in the article, but I would highly recommend reading https://www.ralfj.de/blog/2018/08/07/2018-07-24/pointers-and....
It was a really amazing read.
Link seems to be live here now: https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html