Comment by Animats

7 years ago

It's fascinating to see the problem being addressed. But it all seems to be about "raw pointers", which are mostly for talking to non-Rust code and probably ought to be on the way out as more low-level libraries are implemented in Rust.

The expressive problems with the ownership model mostly involve backpointers. It's hard to create doubly-linked lists or trees with backpointers in safe Rust, short of going to reference counts and weak pointers. This proposal doesn't seem to address that. Backlinks imply a consistency constraint between forward and back pointers, and Rust lacks the expressive power to talk about that.

The other big expressive issue involves partially initialized arrays. The underlying machinery for a vector involves getting uninitialized memory and initializing it one entry at a time as more of the vector's space is used. You can't talk about that within Rust. It's the sort of thing for which a simple inductive proof is needed, once you're able to say things like "Only elements 0..N of this array are valid".

These are places where you need unsafe Rust because you can't say what you mean in safe Rust. That's the first problem, insufficient expressive power. The second problem is proving that what you said is correct. That's a technical compiler/verifier issue.

A model that just talks about "the first N elements of this allocation are initialized" or even "this contiguous subsequence is initialized" is very limiting. E.g. it doesn't allow for a ring buffer, nor a hashmap that leaves some locations uninit.

I don't see how baking special cases for safety into the compiler is any better than having more general tooling, and constructing safe wrappers in Rust.

Also, just for some context, you're often complaining that Rust is too complicated comparing it to C++. Your desires for extra power to reason about pointer-soups (or writing inductive proofs in the language, rather than using existing tooling for it) aren't going to fix that at all, and indeed are likely to add a whole extra level of complexity.

It doesn't seem like a good use of time, or "language-complexity budget", to have the language features that specifically allow `Vec<T>` to be implemented safely, when that can all just be done in (not-that-complicated) normal Rust code. Additionally, reasoning about code (i.e. feeling satisfied in its correctness) is almost always easier than reasoning about code that reasons about code (i.e. feeling satisfied that the proof-checker/compile will correctly prove other code's correctness).

After we have a formal understanding of exactly which invariants unsafe Rust code may rely on and must preserve, we may be able to integrate a proof assistant into Rust to allow all that functionality and more to be exposed safely—we could replace unsafe {} blocks with automatically checked proof annotations! That’s a dream for the far future, but Ralf’s work is an important step toward making that possible.

  • That’s a dream for the far future.

    Which it has been for too long. I was doing automated program proving on basic validity stuff like this over 30 years ago.

    In order to prove something, you first need a notation for talking about it. Few languages let you talk about uninitialized memory space properly. Constructors are too magic.

    Un-initialized memory can be viewed as write-only. You want to show that no un-initialized byte is ever read. Memory comes from the allocator in blocks of un-initialized bytes.

    A constructor is simply a function that takes a block of un-initialized memory and assigns every byte to something meaningful for that type. At the end of construction, the memory block becomes a valid instance of that type.

    Once you can talk about construction as a non-magic operation, you can talk about partial construction of an array. Until you can talk about it, you can't prove anything about it.

    Some types, such as "byte", are fully mapped - any bit pattern is valid. So they can be initialized by anything that fills them. That's how input buffers can be expressed.

    • Note that Rust essentially expresses all of that. It has no magic constructors, and Vec is a block of uninitialized memory along with safe partial construction of array.

      It's true that it's dynamic construction (although other similar constructs allow for doing it for statically sized arrays), but it is demonstrating Rust's power: making safe APIs over `unsafe` code without having to delve into the depths of a complicated compiler/proof-assistant. This is exactly what `unsafe` is for: extending what safe code can do by temporarily stepping outside the bounds of what the language directly encodes.

    • > A constructor is simply a function that takes a block of un-initialized memory and assigns every byte

      In C++, the constructor doesn't initialise every bytes (padding for example)..

      2 replies →

The lack of expressive power in safe Rust is indeed a problem. However, safe Rust is getting more expressive slowly but surely. Examples are non-lexical lifetimes that allow more flexibility locally and generic associated types that allow new types of generic interfaces (allowing, for example, expressing streaming iterators generically). I'm pretty sure there will be more enhancements in expressive power in the future.

More importantly, the lack of soundness guarantees in unsafe Rust is a more critical problem. This is because unsafe is the primary mechanism to supplement safe Rust when safe Rust isn't expressive enough – it's not just for FFI like you claim! That foundation needs to be rock solid, and this work is addressing that problem.

Btw. backpointers require a guarantee that the values pointed at are not moved as long as the pointers themselves are valid. Before, the compiler would ensure that things borrowed from didn't move, but that isn't enough to implement some systems with longer-living or self-referential pointers. And generally Rust doesn't guarantee that things won't move around. But there is going to be support for "pinning" values. That still doesn't get you all you need for backpointers (which have really complicated semantics if you think about it!), but again, one step of expressivity more that allows you to implement backpointers in unsafe code, but then expose a safe interface where you can get "pinned" references to the nodes.

  • Backpointers aren't that bad. You have two pointers, locked together by a simple invariant. That invariant has to be valid both before and after any operation on either pointer or deletion of either object. The problem is that in most languages, it's hard to talk about an invariant between two variables. But what you have to check or prove is trivial. You don't need a full theorem prover, just some standard compiler-type checks.

    That's what I'm getting at here. There are only a few special cases where you really need unsafe code. Backpointers, partially initialized arrays, and foreign functions cover most of it. Yes, you can construct fancy situations like hash maps where there's some performance benefit in having complicated sparse arrays of pointers interspersed with junk. Is it worth it? Zeroing memory is cheap. Go zeros everything and doesn't seem to suffer.

    The use case of Rust is memory safety without garbage collection. Keep that in mind. Try not to get too clever. Too much "unsafe", and it will break. Not just through coding bugs. Because someone will change something that was based on an implicit assumption made in the past by another.

    • You don't get to hand-wave away the complexity by just asserting that it is trivial. The hard part of all static analysis tools is reasoning about pointers.

      > Go zeros everything and doesn't seem to suffer.

      What are you referring to by this? For most CPU bound things, Go is measurably slower than Rust. If you mean that Go is successful as a language, well... that's almost like saying "why do you need structs/value types? Python allocates everything and doesn't seem to suffer". The languages have different domains/primary targets.

> But it all seems to be about "raw pointers", which are mostly for talking to non-Rust code and probably ought to be on the way out as more low-level libraries are implemented in Rust.

Plenty of pure Rust code uses pointers. They're necessary for the implementation of things like `Vec`. Raw pointers aren't going anywhere.

> The expressive problems with the ownership model mostly involve backpointers.

These problems can be trivially solved in 95% of cases with arena allocation. The other 5% fundamentally require bespoke solutions because you're trying to do something elaborate/niche.

This work is far more foundational than any of that. It applies to references just as much as raw pointers- it determines how much the borrow checker is allowed to permit, for example. You can't even begin to write the sorts of things you're suggesting without this.