Bonus: recent talk from Ralf Jung on his group's efforts to precisely specify Rust's operational semantics in executable form in a dialect of Rust: https://youtube.com/watch?v=yoeuW_dSe0o
> On the one hand, compilers would like to exploit the strong guarantees of the type system—particularly those pertaining to aliasing of pointers—in order to unlock powerful intraprocedural optimizations.
How true is this really?
Torvalds has argued for a long time that strict aliasing rules in C are more trouble than they're worth, I find his arguments compelling. Here's one of many examples: https://lore.kernel.org/all/CAHk-=wgq1DvgNVoodk7JKc6BuU1m9Un... (the entire thread worth reading if you find this sort of thing interesting)
Is Rust somehow fundamentally different? Based on limited experience, it seems not (at least, when unsafe is involved...).
I would agree that C's strict aliasing rules are terrible. The rules we are proposing for Rust are very different. They are both more useful for compilers and, in my opinion, less onerous for programmers. We also have an actual in-language opt-out: use raw pointers. And finally, we have a tool you can use to check your code.
But in the end, it's a trade-off, like everything in language design. (In life, really. ;) We think that in Rust we may have found a new sweet spot for this kind of optimizations. Time will tell whether we are right.
As someone who has been writing a lot of unsafe Rust (mostly in an embedded context), I'm thrilled about and thankful for the work that you, your students, and the opsem team are doing.
When you're working with anything below the application level, C's confusing and underspecified rules about UB are almost impossible to keep track of, especially when it comes to aliasing and volatile/MMIO. The spec is so difficult to read and full of complicated cross-references that to actually get a practical answer you have to look for a random Stack Overflow post that may or may not have a correct interpretation of the spec, and may or may not address your specific problem.
Rust right now feels a lot harder to work with, because the spec isn't done. When you have a concrete question about a piece of code, like "is this conversion from an &mut to a *mut and back sound", and you try to look for documentation on it, you get either "Nobody knows, Rust aliasing model isn't defined"; a hand-wavy explanation that is not rigorous or specific; or a model like Stack Borrows or Tree Borrows that's defined a little too formally for easy digestion :)
But when I really started digging, I realized just how much cleaner Rust's semantics are. References aren't actually hard, Tree Borrows basically boils down to "while an &mut reference is live, you can only access the value through pointers or references derived from that reference". Pointer operations have straightforward semantics, there's no confusing notions of typed memory, and no UB "just because" for random things like integer overflow. It's just so much less complicated to understand than C's abstract machine.
I'm really looking forward to things like MiniRust, and to an aliasing model making it into the Reference / other documentation, because at that point I feel like unsafe Rust will be way easier to write confidently and correctly than C.
Congrats on the publication, and thanks again for the work you all have put into this.
Thanks for replying Ralf. I'm barely qualified to have an opinion about these things, if I am at all :)
> The rules we are proposing for Rust are very different. They are both more useful for compilers and, in my opinion, less onerous for programmers.
My question was too vague, what I meant to ask was: what aliasing optimizations will be possible in Rust that aren't possible in C?
Example 18 in the paper is one, if I'm understanding it. But that specific example with a pointer passed to a function seems analogous to what is possible with 'restrict' in C, I'm struggling to come up with more general examples that don't involve gratuitous globals.
It seems to me like having to allow for the possibility of unsafe constrains the ability to do "heroic" optimizations such as what jcranmer described elsewhere in the thread. If that is true to some extent, is there a future where Rust might optimize more aggressively if the programmer promises not to use unsafe anywhere in the program? That's something I've always been curious about.
Rust's aliasing rules are very different from C's.
In C you have a nuclear `restrict` that in my experience does anything only when applied to function arguments across clang & gcc, and type-based aliasing which is both not a generally-usable thing (don't have infinite different copies of the int64_t type (and probably wouldn't want such either)), and annoying (forces using memcpy if you want to reinterpret to a different type).
Whereas with Rust references you have finely-bounded lifetimes and spans and mutability, and it doesn't actually care about the "physical" types, so it is possible to reinterpret memory as both `&mut i32`/`&i32` and `&mut i64`/`&i64` and switch between the two for the same memory, writing/reading halves of/multiple values by the most bog standard Rust safe reads & writes, as long as the unsafe abstraction never gives you overlapping `&mut` references at the same time, or split a `&mut` into multiple non-overlapping `&mut`s.
> don't have infinite different copies of the int64_t type
You can make some, though!
Basically, the idea is to define a class template NoAlias<T, Tag> that contains a single value of type T. Implement operators etc. to make the type useful in practice for working with the wrapped value. Type-based alias rules mean that an access to the value wrapped in NoAlias<int64_t, Tag1> can never alias a value wrapped in NoAlias<int64_t, Tag2>. (Tag1 and Tag2 can just be forward-declared structs that are never defined.)
One time, I even encountered a situation where this was mildly useful.
Take anything Linus says about compilers with a grain of salt--he writes OS kernels, not compilers, and those are pretty different domains.
Alias analysis is extremely important for getting good performance these days--but it should also be remembered that the biggest benefits accrue from the simplest heuristics (like two loads that use the same SSA value as the pointer must alias each other). In LLVM terms, that's BasicAA: a collection of very simple heuristics that primarily amounts to "if we can track down the allocation sites of objects, we can definitively resolve these alias queries; otherwise, we don't know."
The real question that you're trying to ask, though, is what is the value of alias analyses that go beyond the most basic, obvious tests. At the point where the alias queries are no longer trivial to solve, then it's generally the case that what you can do as a result of those queries also shrinks dramatically, pretty much to looking for code motion hazards, and the benefits you get from that are much reduced. One of the experiments I would like to do is measure the total speedup you'd get from a theoretically perfect alias analysis, and my guess is that it's somewhere in the 20% range even on non-HPC code like the Linux kernel [1].
[1] This doesn't account for the heroic optimizations, such as data-layout transformations, that you wouldn't attempt to write without a very high-quality alias analysis. But since we already know that alias analysis doesn't exist in practice, we're not going to attempt those optimizations anyways, so it's not worth including such stuff in prospective speed gains.
> Take anything Linus says about compilers with a grain of salt
I think he's making an argument about CPU behavior here more than about compilers: if we call loads and stores which aliasing optimizations might remove "redundant", he's saying that modern machines with big caches and store buffers make those "redundant" operations so cheap they don't matter in practice for most workloads.
Of course, it's admittedly an oversimplification to reduce aliasing optimizations to simply eliminating loads and stores, you described things which go beyond that.
However, I just ran a silly little test where I added `-fno-strict-aliasing` to CFLAGS and rebuilt the world on one of my gentoo build machines, it only got 1.5% slower at compiling an allmodconfig linux kernel (gcc-14):
> I spoke to Apple folks when their compiler team switched the default to strict aliasing. They reported that it made key workloads 5-10% faster and the fixes were much easier to do and upstream than I would have expected. My view of -fstrict-aliasing at the time was that it was a flag that let you generate incorrect code that ran slightly faster. They had actual data that convinced me otherwise.
Maybe Linus by writing kernels, not compilers, has even more to say, because his use-case is much more practical than anything that compiler designers could imagine.
I’d be interested to see a more thorough analysis, but there is a simple way to gauge this - rip out all the parts of the compiler where aliasing information is propagated to LLVM, and see what happens to performance.
I found a claim that noalias contributes about 5% performance improvement in terms of runtimes[0], though the data is obviously very old.
> I found a claim that noalias contributes about 5% performance improvement
Note that that comment is implying a range, from 0% improvement on some benchmarks to 5% improvement on others. It suggests that 5% is generally in the ballpark of the upper bound of what you should expect from putting noalias on mutable references, but that some specific cases could see better results.
And TBAA is much easier for programmers to wrangle than the aliasing of Rust, right? The corresponding aliasing feature for C would be _restrict_, which is rarely used.
Though Linus and Linux turns off even strict aliasing/TBAA.
While I can't name the product I work on, we also use -fno-strict-aliasing. The problem with these optimisations is that they can only be done safely if you can prove aliasing never happens, which is equivalent to solving the halting problem in C++. In Rust I suspect the stronger type system can actually prove that aliasing doesn't happen in select cases. In any case, I can always manually do the optimisations enabled by strict aliasing in hot code, but I can never undo a customer losing data due to miscompilation.
> which is equivalent to solving the halting problem
Most worthwhile undefined behaviour is. If the compiler could tell whether it was happening or not, it would be defined as an error. Surely detecting whether a tree borrow violation happens is also equivalent to solving the halting problem?
When unsafe is involved, sure, but that's a pretty small fraction of Rust code as a whole. As far as I can tell, a lot of his argument seems to be more against needing language changes in C to take advantage of strict aliasing that come at the cost of expanding the circumstances where UB can occur, but I don't really see how those would apply to safe Rust when it already has a perfectly safe way to express a reference guaranteed to not have any alias, i.e. `&mut T`. If the compiler authors came up with better ways to generate optimized code with unaliased pointers, I don't see why Rust would need to make any language changes in order to take advantage of them. That doesn't necessarily mean that there there is any significant untapped potential for these sorts of optmizations of course, or that the amount of effort to identify and implement them would be worthwhile for a toolchain like LLVM that is used for far more than just Rust, but it's not clear to me why the arguments he gives would be relevant to Rust.
Personally, I would like compilers to better exploit vectorization, which can get you 2x to 10x faster on random things within typical workloads, rather than worry about dubious optimizations that have performance improvements that may or may not be caused by changing the alignment of code and data blocks.
I would like to see more effort dedicated to basic one liners that show up in real code like counting how many of a given character are in a string. E.g. `for (str) |e| count += e == '%'`. For this, LLVM spits out a loop that wants to do horizontal addition every iteration on x86-64 targets with vectors, at least. Let's focus on issues that can easily net a 2x performance gain before going after that 1-2% that people think pointer aliasing gets you.
Pointer aliasing information is often crucial for vectorization. So in that sense TB is a big boost for vectorization.
Also, note that the paper discussed here is about the model / language specification that defines the envelope of what the compiler is allowed to optimize. Actually getting the compiler to optimize concrete real-world cases is an entirely separate line of work, and needs completely different expertise -- expertise me and my group do not have. We can only lay the framework for compiler wizards to have fun with. :)
Pointer aliasing is necessary for auto vectorization, because you can't perform SIMD if the data you're modifying overlaps with the data you're reading and since the compiler is only allowed to modify the code in a way that is legal for all inputs, it will be conservative and refuse to vectorize your code rather than break it in situations with pointer aliasing.
Maybe this was a too convoluted way of saying this:
Loading something from main memory into a register creates a locally cached copy. This mini cache needs to be invalidated whenever a pointer can potentially write to the location in main memory that this copy is caching. In other words, you need cache synchronization down to the register level, including micro architectural registers that are implementation details of the processor in question. Rather than do that, if you can prove that you are the exclusive owner of a region in memory, you know that your copy is the most up to date version and you know when it gets updated or not. This means you are free to copy the main memory into your vector register and do anything you want, including scalar pointer writes to main memory, since you know they are unrelated and will not invalidate your vector registers.
Strict aliasing rules are useful conditional on them being sufficiently expressive and sensible, otherwise they just create pointless headaches that require kludgy workarounds or they are just disabled altogether. I don't think there is much disagreement that C strict aliasing rules are pretty broken. There is no reason a language like Rust can't be designed with much more sensible strict aliasing rules. Even C++ has invested in providing paths to more flexibility around strict aliasing than C provides.
But like Linus, I've noticed it doesn't seem to make much difference outside of obvious narrow cases.
>There is no reason a language like Rust can't be designed with much more sensible strict aliasing rules.
The aliasing rules of Rust for mutable references are different and more difficult than strict aliasing in C and C++.
Strict aliasing in C and C++ are also called TBAA, since they are based on compatible types. If types are compatible, pointers can alias. This is different in Rust, where mutable references absolutely may never alias, not even if the types are the same.
Rust aliasing is more similar to C _restrict_.
The Linux kernel goes in the other direction and has strict aliasing optimization disabled.
> Compiler people think aliasing matters. It very seldom does. And VLIW will never become practical for entirely unrelated reasons (read: OoO is fundamentally superior to VLIW in general purpose computing).
“Since the number of transistors on a chip has grown, the perceived disadvantages of the VLIW have diminished in importance. VLIW architectures are growing in popularity, especially in the embedded system market, where it is possible to customize a processor for an application in a system-on-a-chip.”
That may be untrue or no longer true, but clicking around looking a the Wikipedia pages for several of the mentioned VLIW designs, I couldn’t find conclusive evidence for that (but, as with that VLIW page, the pages I looked at may be out of date, weren’t clear as to whether chips still were in production, whether newer generations actually still are VLIW, etc.)
It is mostly useful on arrays/numeric code, probably next to useless otherwise. Numerics people was the ones who sponsored much of compiler/optimization work in the first place, that's how strict aliasing came to be.
I don't think the usefulness is that skewed towards numerics?
Both clang/llvm and gcc can do alias checking at runtime if they can't at compile-time, which makes loops vectorizable without alias info, at the cost of a bit of constant overhead for checking aliasing. (there's the exception of gather loads though, where compile-time aliasing info is basically required)
And on the other hand there's good potential for benefit to normal code (esp. code with layers of abstractions) - if you have a `&i32`, or any other immutable reference, it's pretty useful for compiler to be able to deduplicate/CSE loads/computations from it from across the whole function regardless of what intermediate writes to potentially-other things there are.
Hmm i just tested out the claim that the following rust code would be rejected ( Example 4 in the paper).
And it seams to not be the case on the stable compiler version?
fn write(x: &mut i32) {*x = 10}
fn main() {
let x = &mut 0;
let y = x as *mut i32;
//write(x); // this should use the mention implicit twophase borrow
*x = 10; // this should not and therefore be rejected by the compiler
unsafe {*y = 15 };
}
Stacked borrows is miri's runtime model. Run it under miri and you will see the error reported for the `*x = 10;` version but not the `write(x);` version - "Undefined Behavior: attempting a write access using [...] but that tag does not exist in the borrow stack for this location".
rustc itself has no reason to reject either version, because y is a *mut and thus has no borrow/lifetime relation to the &mut that x is, from a compile-time/typesystem perspective.
The paper is describing the behavior under the proposed Tree Borrows model, not the current borrow checker implementation which uses a more limited analysis that doesn't detect this particular conflict between the raw pointer and mutable reference.
Amazing work, I remember reading the Tree Borrows spec? on Nevin's website a couple years ago and being thoroughly impressed by how it solves some pretty gnarly issue quite elegantly. And in my experience [1] [2] it does indeed allow for sensible code that is illegal under Stacked Borrows.
I wonder if Rust or future PL would evolve into allowing multiple borrow checker implementations with varying characteristics (compile speed, runtime speed, algorithm flexibility, etc.) that projects can choose from.
Rust already supports switching between borrow checker implementations.
It has migrated from a scope-based borrow checker to non-lexical borrow checker, and has next experimental Polonius implementation as an option. However, once the new implementation becomes production-ready, the old one gets discarded, because there's no reason to choose it. Borrow checking is fast, and the newer ones accept strictly more (correct) programs.
You also have Rc and RefCell types which give you greater flexibility at cost of some runtime checks.
>I recommend watching the video @nerditation linked. I believe Amanda mentioned somewhere that Polonius is 5000x slower than the existing borrow-checker; IIRC the plan isn't to use Polonius instead of NLL, but rather use NLL and kick off Polonius for certain failure cases.
I think GP is talking about somehow being able to, for example, more seamlessly switch between manual borrowing and "checked" borrowing with Rc and RefCell.
We already have that by having multiple approaches via affine types (what Rust uses), linear types, effects, dependent types, formal proofs.
All have different costs and capabilities across implementation, performance and developer experience.
Then we have what everyone else besides Rust is actually going for, the productivity of automatic resource management (regardless of how), coupled with one of the type systems above, only for performance critical code paths.
I'd just like to interject for a moment. What you’re referring to as "affine types", is in fact, Uniqueness Types. The difference has to do with how they interact with unrestricted types. In Rust, these "unrestricted types" are references (which can be used multiple times due to implementing Copy).
Uniqueness types allow functions to place a constraint on the caller ("this argument cannot be aliased when you pass it to me"), but places no restriction on the callee. This is useful for Rust, because (among other reasons) if a value is not aliased you can free it and be sure that you're not leaving behind references to freed data.
Affine types are the opposite - they allow the caller to place a restriction on the callee ("I'm passing you this value, but you may use it at most once"), which is not something possible to express in Rust's type system, because the callee is always free to create a reference from its argument and pass that reference to multiple functions..
I would love some sort of affine types in languages like Kotlin, it just makes cleaner code organization in my opinion.
Doesn't matter if it's purely "syntaxical" because the language is garbage collected, just the fact of specifying what owns what and be explicit about multiple references is great imo.
Some sort of effects systems can already be simulated with Kotlin features too.
What you actually want is the underlying separation logic, so you can precisely specify function preconditions and prove mid-function conditions, and the the optomizer can take all those "lemmas" and go hog-wiled, right up to but not past what is allowed by the explicitly stated invariants.
"Rust", in this context, is "merely" "the usual invariants that people want" and "a suite of optimizations that assume those usual invariants, but not more or less".
Rust's borrow checker has a fairly minimal compile time cost and does not impact codegen at all. Most of the compile time is spent on trait resolution, monomophization, optimization passes in LLVM, and linking.
> As I understand it the borrow checker only has false negatives but no false positives, correct?
The borrow checker is supposed to be a sound static analysis, yes. I think Ralf Jung's comment at https://news.ycombinator.com/item?id=44511416 says soundness hasn't been proved relative to tree borrows yet.
> Maybe a dumb question but couldn't you just run multiple implementations in parallel threads and whichever finishes first with a positive result wins?
IIUC when you're compiling reasonably-sized programs you're already using all the cores, so parallelizing here doesn't seem like it's going to net you much gain, especially if it means you're doing a lot of extra work.
This presumes that checking composes which may not if you have orthogonal checker implementations. You might end up risking accepting an invalid program because part of it is valid under one checker, part under another, but the combination isn't actually valid. But maybe that's not actually possible in practice.
I cannot imagine how that would work. You couldn't combine code that expect different borrowing rules to be applied. You'd effectively be creating as many sub-dialects as there are borrow checker implementations.
I'm guessing you're referring to being able to change models without needing to change the code, but it's worth mentioning that there already is a mechanism to defer borrow-checking until runtime in Rust in the form of RefCell. This doesn't change the underlying exclusivity rules to allow aliasing mutable borrows, but it does allow an alternative to handling everything at compile time.
> The problem with unsafe code is that it can do things like this:
fn main() {
let mut x = 42;
let ptr = &mut x as *mut i32;
let val = unsafe { write_both(&mut *ptr, &mut *ptr) };
println!("{val}");
}
No it can't? Using pointers to coexist multiple mutable references to the same variable is undefined behavior. Unless I'm just misunderstanding the point they're trying to make here.
The point of this work is to pin down the exact boundaries of undefined behavior. Certainly the code above is accepted by the Rust compiler, but it also breaks rules. What rules? In essence, we know that:
- Anything accepted by the borrow checker is legal
- Unsafe can express illegal / undefined behavior
- There's some set of rules, broader than what the borrow checker can check, that is still legal / defined behavior
The goal of this line of work is to precisely specify that set of rules. The outlines are clear (basically, no writable pointers should alias) but the details (interior pointers, invalidation of iterators, is it creating or using bad pointers that's bad, etc) are really hard. The previous paper in this series, on Stacked Borrows, was simpler but more restrictive, and real-world unsafe code often failed its rules (while still seeming correct). Tree Borrows is broader and allows more while still being provably safe.
Note that we have not yet proven this. :) I hope to one day prove that every program accepted by the borrow checker is compatible with TB, but right now, that is only a (very well-tested) conjecture.
> Using pointers to coexist multiple mutable references to the same variable is undefined behavior.
Yes, but which exact rule does it violate? What is the exact definition that says that it is UB?
Tree Borrows is a proposal for exactly such a definition.
"code can do things like this" here means "you can write this code and compile it and run it and it will do something, and unless we have something like Tree Borrows we have no argument for why there would be anything wrong with this code".
You seem to have already accepted that we need something like Tree Borrows (i.e., we should say code like this is UB). This part of the paper is arguing why we need something like Tree Borrows. :)
You're already getting a lot of replies, and I don't want to pile on, but I think the clearest way to see the intent there is at the start of the following paragraph:
> Given that aliasing optimizations are something that the Rust compiler developers clearly want to support, we need some way of “ruling out” counterexamples like the one above from consideration.
I believe that's exactly the point: it's too easy to violate constraints like not allowing multiple mutable references. Unsafe is meant for cases where the validity of the code is difficult to prove with rust's lifetime analysis, but can be abused to do much more than that.
Recent blog post from Ralf Jung providing some extra context: https://www.ralfj.de/blog/2025/07/07/tree-borrows-paper.html
Bonus: recent talk from Ralf Jung on his group's efforts to precisely specify Rust's operational semantics in executable form in a dialect of Rust: https://youtube.com/watch?v=yoeuW_dSe0o
> On the one hand, compilers would like to exploit the strong guarantees of the type system—particularly those pertaining to aliasing of pointers—in order to unlock powerful intraprocedural optimizations.
How true is this really?
Torvalds has argued for a long time that strict aliasing rules in C are more trouble than they're worth, I find his arguments compelling. Here's one of many examples: https://lore.kernel.org/all/CAHk-=wgq1DvgNVoodk7JKc6BuU1m9Un... (the entire thread worth reading if you find this sort of thing interesting)
Is Rust somehow fundamentally different? Based on limited experience, it seems not (at least, when unsafe is involved...).
I would agree that C's strict aliasing rules are terrible. The rules we are proposing for Rust are very different. They are both more useful for compilers and, in my opinion, less onerous for programmers. We also have an actual in-language opt-out: use raw pointers. And finally, we have a tool you can use to check your code.
But in the end, it's a trade-off, like everything in language design. (In life, really. ;) We think that in Rust we may have found a new sweet spot for this kind of optimizations. Time will tell whether we are right.
As someone who has been writing a lot of unsafe Rust (mostly in an embedded context), I'm thrilled about and thankful for the work that you, your students, and the opsem team are doing.
When you're working with anything below the application level, C's confusing and underspecified rules about UB are almost impossible to keep track of, especially when it comes to aliasing and volatile/MMIO. The spec is so difficult to read and full of complicated cross-references that to actually get a practical answer you have to look for a random Stack Overflow post that may or may not have a correct interpretation of the spec, and may or may not address your specific problem.
Rust right now feels a lot harder to work with, because the spec isn't done. When you have a concrete question about a piece of code, like "is this conversion from an &mut to a *mut and back sound", and you try to look for documentation on it, you get either "Nobody knows, Rust aliasing model isn't defined"; a hand-wavy explanation that is not rigorous or specific; or a model like Stack Borrows or Tree Borrows that's defined a little too formally for easy digestion :)
But when I really started digging, I realized just how much cleaner Rust's semantics are. References aren't actually hard, Tree Borrows basically boils down to "while an &mut reference is live, you can only access the value through pointers or references derived from that reference". Pointer operations have straightforward semantics, there's no confusing notions of typed memory, and no UB "just because" for random things like integer overflow. It's just so much less complicated to understand than C's abstract machine.
I'm really looking forward to things like MiniRust, and to an aliasing model making it into the Reference / other documentation, because at that point I feel like unsafe Rust will be way easier to write confidently and correctly than C.
Congrats on the publication, and thanks again for the work you all have put into this.
22 replies →
Agreed about C's aliasing rules. Fortran had a better set of defaults.
Thanks for replying Ralf. I'm barely qualified to have an opinion about these things, if I am at all :)
> The rules we are proposing for Rust are very different. They are both more useful for compilers and, in my opinion, less onerous for programmers.
My question was too vague, what I meant to ask was: what aliasing optimizations will be possible in Rust that aren't possible in C?
Example 18 in the paper is one, if I'm understanding it. But that specific example with a pointer passed to a function seems analogous to what is possible with 'restrict' in C, I'm struggling to come up with more general examples that don't involve gratuitous globals.
It seems to me like having to allow for the possibility of unsafe constrains the ability to do "heroic" optimizations such as what jcranmer described elsewhere in the thread. If that is true to some extent, is there a future where Rust might optimize more aggressively if the programmer promises not to use unsafe anywhere in the program? That's something I've always been curious about.
Rust's aliasing rules are very different from C's.
In C you have a nuclear `restrict` that in my experience does anything only when applied to function arguments across clang & gcc, and type-based aliasing which is both not a generally-usable thing (don't have infinite different copies of the int64_t type (and probably wouldn't want such either)), and annoying (forces using memcpy if you want to reinterpret to a different type).
Whereas with Rust references you have finely-bounded lifetimes and spans and mutability, and it doesn't actually care about the "physical" types, so it is possible to reinterpret memory as both `&mut i32`/`&i32` and `&mut i64`/`&i64` and switch between the two for the same memory, writing/reading halves of/multiple values by the most bog standard Rust safe reads & writes, as long as the unsafe abstraction never gives you overlapping `&mut` references at the same time, or split a `&mut` into multiple non-overlapping `&mut`s.
> don't have infinite different copies of the int64_t type
You can make some, though!
Basically, the idea is to define a class template NoAlias<T, Tag> that contains a single value of type T. Implement operators etc. to make the type useful in practice for working with the wrapped value. Type-based alias rules mean that an access to the value wrapped in NoAlias<int64_t, Tag1> can never alias a value wrapped in NoAlias<int64_t, Tag2>. (Tag1 and Tag2 can just be forward-declared structs that are never defined.)
One time, I even encountered a situation where this was mildly useful.
4 replies →
Take anything Linus says about compilers with a grain of salt--he writes OS kernels, not compilers, and those are pretty different domains.
Alias analysis is extremely important for getting good performance these days--but it should also be remembered that the biggest benefits accrue from the simplest heuristics (like two loads that use the same SSA value as the pointer must alias each other). In LLVM terms, that's BasicAA: a collection of very simple heuristics that primarily amounts to "if we can track down the allocation sites of objects, we can definitively resolve these alias queries; otherwise, we don't know."
The real question that you're trying to ask, though, is what is the value of alias analyses that go beyond the most basic, obvious tests. At the point where the alias queries are no longer trivial to solve, then it's generally the case that what you can do as a result of those queries also shrinks dramatically, pretty much to looking for code motion hazards, and the benefits you get from that are much reduced. One of the experiments I would like to do is measure the total speedup you'd get from a theoretically perfect alias analysis, and my guess is that it's somewhere in the 20% range even on non-HPC code like the Linux kernel [1].
[1] This doesn't account for the heroic optimizations, such as data-layout transformations, that you wouldn't attempt to write without a very high-quality alias analysis. But since we already know that alias analysis doesn't exist in practice, we're not going to attempt those optimizations anyways, so it's not worth including such stuff in prospective speed gains.
> Take anything Linus says about compilers with a grain of salt
I think he's making an argument about CPU behavior here more than about compilers: if we call loads and stores which aliasing optimizations might remove "redundant", he's saying that modern machines with big caches and store buffers make those "redundant" operations so cheap they don't matter in practice for most workloads.
Of course, it's admittedly an oversimplification to reduce aliasing optimizations to simply eliminating loads and stores, you described things which go beyond that.
However, I just ran a silly little test where I added `-fno-strict-aliasing` to CFLAGS and rebuilt the world on one of my gentoo build machines, it only got 1.5% slower at compiling an allmodconfig linux kernel (gcc-14):
That's on a shiny new znver5.
1 reply →
Here's another data point: https://lobste.rs/s/yubalv/pointers_are_complicated_ii_we_ne...
> I spoke to Apple folks when their compiler team switched the default to strict aliasing. They reported that it made key workloads 5-10% faster and the fixes were much easier to do and upstream than I would have expected. My view of -fstrict-aliasing at the time was that it was a flag that let you generate incorrect code that ran slightly faster. They had actual data that convinced me otherwise.
5 replies →
Maybe Linus by writing kernels, not compilers, has even more to say, because his use-case is much more practical than anything that compiler designers could imagine.
5 replies →
>How true is this really?
I’d be interested to see a more thorough analysis, but there is a simple way to gauge this - rip out all the parts of the compiler where aliasing information is propagated to LLVM, and see what happens to performance.
I found a claim that noalias contributes about 5% performance improvement in terms of runtimes[0], though the data is obviously very old.
https://github.com/rust-lang/rust/issues/54878#issuecomment-...
> I found a claim that noalias contributes about 5% performance improvement
Note that that comment is implying a range, from 0% improvement on some benchmarks to 5% improvement on others. It suggests that 5% is generally in the ballpark of the upper bound of what you should expect from putting noalias on mutable references, but that some specific cases could see better results.
While both involve aliasing, C's strict aliasing and Rust's aliasing are two different things. Rust pretty explicitly did not adopt the C style.
C's aliasing is based on type alone, hence its other name "type based alias analysis" or TBAA.
And TBAA is much easier for programmers to wrangle than the aliasing of Rust, right? The corresponding aliasing feature for C would be _restrict_, which is rarely used.
Though Linus and Linux turns off even strict aliasing/TBAA.
2 replies →
While I can't name the product I work on, we also use -fno-strict-aliasing. The problem with these optimisations is that they can only be done safely if you can prove aliasing never happens, which is equivalent to solving the halting problem in C++. In Rust I suspect the stronger type system can actually prove that aliasing doesn't happen in select cases. In any case, I can always manually do the optimisations enabled by strict aliasing in hot code, but I can never undo a customer losing data due to miscompilation.
> actually prove that aliasing doesn't happen in select cases
In the safe subset of Rust it's guaranteed in all cases. Even across libraries. Even in multi-threaded code.
6 replies →
> which is equivalent to solving the halting problem
Most worthwhile undefined behaviour is. If the compiler could tell whether it was happening or not, it would be defined as an error. Surely detecting whether a tree borrow violation happens is also equivalent to solving the halting problem?
When unsafe is involved, sure, but that's a pretty small fraction of Rust code as a whole. As far as I can tell, a lot of his argument seems to be more against needing language changes in C to take advantage of strict aliasing that come at the cost of expanding the circumstances where UB can occur, but I don't really see how those would apply to safe Rust when it already has a perfectly safe way to express a reference guaranteed to not have any alias, i.e. `&mut T`. If the compiler authors came up with better ways to generate optimized code with unaliased pointers, I don't see why Rust would need to make any language changes in order to take advantage of them. That doesn't necessarily mean that there there is any significant untapped potential for these sorts of optmizations of course, or that the amount of effort to identify and implement them would be worthwhile for a toolchain like LLVM that is used for far more than just Rust, but it's not clear to me why the arguments he gives would be relevant to Rust.
Personally, I would like compilers to better exploit vectorization, which can get you 2x to 10x faster on random things within typical workloads, rather than worry about dubious optimizations that have performance improvements that may or may not be caused by changing the alignment of code and data blocks.
I would like to see more effort dedicated to basic one liners that show up in real code like counting how many of a given character are in a string. E.g. `for (str) |e| count += e == '%'`. For this, LLVM spits out a loop that wants to do horizontal addition every iteration on x86-64 targets with vectors, at least. Let's focus on issues that can easily net a 2x performance gain before going after that 1-2% that people think pointer aliasing gets you.
Pointer aliasing information is often crucial for vectorization. So in that sense TB is a big boost for vectorization.
Also, note that the paper discussed here is about the model / language specification that defines the envelope of what the compiler is allowed to optimize. Actually getting the compiler to optimize concrete real-world cases is an entirely separate line of work, and needs completely different expertise -- expertise me and my group do not have. We can only lay the framework for compiler wizards to have fun with. :)
Pointer aliasing is necessary for auto vectorization, because you can't perform SIMD if the data you're modifying overlaps with the data you're reading and since the compiler is only allowed to modify the code in a way that is legal for all inputs, it will be conservative and refuse to vectorize your code rather than break it in situations with pointer aliasing.
Maybe this was a too convoluted way of saying this:
Loading something from main memory into a register creates a locally cached copy. This mini cache needs to be invalidated whenever a pointer can potentially write to the location in main memory that this copy is caching. In other words, you need cache synchronization down to the register level, including micro architectural registers that are implementation details of the processor in question. Rather than do that, if you can prove that you are the exclusive owner of a region in memory, you know that your copy is the most up to date version and you know when it gets updated or not. This means you are free to copy the main memory into your vector register and do anything you want, including scalar pointer writes to main memory, since you know they are unrelated and will not invalidate your vector registers.
Strict aliasing rules are useful conditional on them being sufficiently expressive and sensible, otherwise they just create pointless headaches that require kludgy workarounds or they are just disabled altogether. I don't think there is much disagreement that C strict aliasing rules are pretty broken. There is no reason a language like Rust can't be designed with much more sensible strict aliasing rules. Even C++ has invested in providing paths to more flexibility around strict aliasing than C provides.
But like Linus, I've noticed it doesn't seem to make much difference outside of obvious narrow cases.
>There is no reason a language like Rust can't be designed with much more sensible strict aliasing rules.
The aliasing rules of Rust for mutable references are different and more difficult than strict aliasing in C and C++.
Strict aliasing in C and C++ are also called TBAA, since they are based on compatible types. If types are compatible, pointers can alias. This is different in Rust, where mutable references absolutely may never alias, not even if the types are the same.
Rust aliasing is more similar to C _restrict_.
The Linux kernel goes in the other direction and has strict aliasing optimization disabled.
5 replies →
> Compiler people think aliasing matters. It very seldom does. And VLIW will never become practical for entirely unrelated reasons (read: OoO is fundamentally superior to VLIW in general purpose computing).
I love how long Linus has been right about this.
Linus worked for Transmeta in 2003. That company built a VLIW CPU.
I don’t think you were a visionary if you stated VLIW is not practical for general purpose computing after that time.
Also “VLIW will never become practical” is a stronger claim than “VLIW will never become practical for general purpose computing”, and may be up for debate. https://en.wikipedia.org/wiki/Very_long_instruction_word says
“Since the number of transistors on a chip has grown, the perceived disadvantages of the VLIW have diminished in importance. VLIW architectures are growing in popularity, especially in the embedded system market, where it is possible to customize a processor for an application in a system-on-a-chip.”
That may be untrue or no longer true, but clicking around looking a the Wikipedia pages for several of the mentioned VLIW designs, I couldn’t find conclusive evidence for that (but, as with that VLIW page, the pages I looked at may be out of date, weren’t clear as to whether chips still were in production, whether newer generations actually still are VLIW, etc.)
He's right about OoO but not about aliasing.
It is mostly useful on arrays/numeric code, probably next to useless otherwise. Numerics people was the ones who sponsored much of compiler/optimization work in the first place, that's how strict aliasing came to be.
I don't think the usefulness is that skewed towards numerics?
Both clang/llvm and gcc can do alias checking at runtime if they can't at compile-time, which makes loops vectorizable without alias info, at the cost of a bit of constant overhead for checking aliasing. (there's the exception of gather loads though, where compile-time aliasing info is basically required)
And on the other hand there's good potential for benefit to normal code (esp. code with layers of abstractions) - if you have a `&i32`, or any other immutable reference, it's pretty useful for compiler to be able to deduplicate/CSE loads/computations from it from across the whole function regardless of what intermediate writes to potentially-other things there are.
2 replies →
The Stacked Borrows mentioned had threads in 2020 and 2018
https://news.ycombinator.com/item?id=17715399
The PLDI talk is also available: https://www.youtube.com/watch?v=CJi_Fcs4bak
Thanks, I added it to the website.
Hmm i just tested out the claim that the following rust code would be rejected ( Example 4 in the paper).
And it seams to not be the case on the stable compiler version?
Stacked borrows is miri's runtime model. Run it under miri and you will see the error reported for the `*x = 10;` version but not the `write(x);` version - "Undefined Behavior: attempting a write access using [...] but that tag does not exist in the borrow stack for this location".
rustc itself has no reason to reject either version, because y is a *mut and thus has no borrow/lifetime relation to the &mut that x is, from a compile-time/typesystem perspective.
Ah that make sense. Thanks for clarifying.
The paper mentions that the authors implemented tree borrows in Miri. Is this change likely to be adopted by Miri as the default model going forward?
The paper is describing the behavior under the proposed Tree Borrows model, not the current borrow checker implementation which uses a more limited analysis that doesn't detect this particular conflict between the raw pointer and mutable reference.
Here's the implementation in Miri for those interested: https://github.com/rust-lang/miri/tree/master/src/borrow_tra...
Amazing work, I remember reading the Tree Borrows spec? on Nevin's website a couple years ago and being thoroughly impressed by how it solves some pretty gnarly issue quite elegantly. And in my experience [1] [2] it does indeed allow for sensible code that is illegal under Stacked Borrows.
[1] https://github.com/Voultapher/sort-research-rs/blob/main/wri... Miri column
[2] https://github.com/rust-lang/rust/blob/6b3ae3f6e45a33c2d95fa...
I wonder if Rust or future PL would evolve into allowing multiple borrow checker implementations with varying characteristics (compile speed, runtime speed, algorithm flexibility, etc.) that projects can choose from.
Rust already supports switching between borrow checker implementations.
It has migrated from a scope-based borrow checker to non-lexical borrow checker, and has next experimental Polonius implementation as an option. However, once the new implementation becomes production-ready, the old one gets discarded, because there's no reason to choose it. Borrow checking is fast, and the newer ones accept strictly more (correct) programs.
You also have Rc and RefCell types which give you greater flexibility at cost of some runtime checks.
The new borrow checker is not yet all that fast. For instance, it was 5000x slower, according to a recent report.
https://users.rust-lang.org/t/polonius-is-more-ergonomic-tha...
>I recommend watching the video @nerditation linked. I believe Amanda mentioned somewhere that Polonius is 5000x slower than the existing borrow-checker; IIRC the plan isn't to use Polonius instead of NLL, but rather use NLL and kick off Polonius for certain failure cases.
I think GP is talking about somehow being able to, for example, more seamlessly switch between manual borrowing and "checked" borrowing with Rc and RefCell.
[dead]
We already have that by having multiple approaches via affine types (what Rust uses), linear types, effects, dependent types, formal proofs.
All have different costs and capabilities across implementation, performance and developer experience.
Then we have what everyone else besides Rust is actually going for, the productivity of automatic resource management (regardless of how), coupled with one of the type systems above, only for performance critical code paths.
> affine types (what Rust uses)
I'd just like to interject for a moment. What you’re referring to as "affine types", is in fact, Uniqueness Types. The difference has to do with how they interact with unrestricted types. In Rust, these "unrestricted types" are references (which can be used multiple times due to implementing Copy).
Uniqueness types allow functions to place a constraint on the caller ("this argument cannot be aliased when you pass it to me"), but places no restriction on the callee. This is useful for Rust, because (among other reasons) if a value is not aliased you can free it and be sure that you're not leaving behind references to freed data.
Affine types are the opposite - they allow the caller to place a restriction on the callee ("I'm passing you this value, but you may use it at most once"), which is not something possible to express in Rust's type system, because the callee is always free to create a reference from its argument and pass that reference to multiple functions..
13 replies →
I would love some sort of affine types in languages like Kotlin, it just makes cleaner code organization in my opinion.
Doesn't matter if it's purely "syntaxical" because the language is garbage collected, just the fact of specifying what owns what and be explicit about multiple references is great imo.
Some sort of effects systems can already be simulated with Kotlin features too.
Programming language theory is so interesting!
What you actually want is the underlying separation logic, so you can precisely specify function preconditions and prove mid-function conditions, and the the optomizer can take all those "lemmas" and go hog-wiled, right up to but not past what is allowed by the explicitly stated invariants.
"Rust", in this context, is "merely" "the usual invariants that people want" and "a suite of optimizations that assume those usual invariants, but not more or less".
Can you help me understand your comment with a simple example? Take slice::split_at and slice::split_at_mut:
What might their triples look like in separation logic?
2 replies →
Rust's borrow checker has a fairly minimal compile time cost and does not impact codegen at all. Most of the compile time is spent on trait resolution, monomophization, optimization passes in LLVM, and linking.
As I understand it the borrow checker only has false negatives but no false positives, correct?
Maybe a dumb question but couldn't you just run multiple implementations in parallel threads and whichever finishes first with a positive result wins?
> As I understand it the borrow checker only has false negatives but no false positives, correct?
The borrow checker is supposed to be a sound static analysis, yes. I think Ralf Jung's comment at https://news.ycombinator.com/item?id=44511416 says soundness hasn't been proved relative to tree borrows yet.
> Maybe a dumb question but couldn't you just run multiple implementations in parallel threads and whichever finishes first with a positive result wins?
IIUC when you're compiling reasonably-sized programs you're already using all the cores, so parallelizing here doesn't seem like it's going to net you much gain, especially if it means you're doing a lot of extra work.
1 reply →
This presumes that checking composes which may not if you have orthogonal checker implementations. You might end up risking accepting an invalid program because part of it is valid under one checker, part under another, but the combination isn't actually valid. But maybe that's not actually possible in practice.
2 replies →
I cannot imagine how that would work. You couldn't combine code that expect different borrowing rules to be applied. You'd effectively be creating as many sub-dialects as there are borrow checker implementations.
FWIW technically the rules are the same. How they go about proving that the rules are upheld for a program is what would be different.
I'm guessing you're referring to being able to change models without needing to change the code, but it's worth mentioning that there already is a mechanism to defer borrow-checking until runtime in Rust in the form of RefCell. This doesn't change the underlying exclusivity rules to allow aliasing mutable borrows, but it does allow an alternative to handling everything at compile time.
Deferring to runtime is not always great, since not only can it incur runtime overhead, the code can also panic if a violation is detected.
1 reply →
What’s wrong with the compile or runtime speed of the current one?
That would result in ecosystem splitting, which isn't great.
From the paper:
> The problem with unsafe code is that it can do things like this:
No it can't? Using pointers to coexist multiple mutable references to the same variable is undefined behavior. Unless I'm just misunderstanding the point they're trying to make here.
The point of this work is to pin down the exact boundaries of undefined behavior. Certainly the code above is accepted by the Rust compiler, but it also breaks rules. What rules? In essence, we know that:
- Anything accepted by the borrow checker is legal
- Unsafe can express illegal / undefined behavior
- There's some set of rules, broader than what the borrow checker can check, that is still legal / defined behavior
The goal of this line of work is to precisely specify that set of rules. The outlines are clear (basically, no writable pointers should alias) but the details (interior pointers, invalidation of iterators, is it creating or using bad pointers that's bad, etc) are really hard. The previous paper in this series, on Stacked Borrows, was simpler but more restrictive, and real-world unsafe code often failed its rules (while still seeming correct). Tree Borrows is broader and allows more while still being provably safe.
> allows more while still being provably safe.
Note that we have not yet proven this. :) I hope to one day prove that every program accepted by the borrow checker is compatible with TB, but right now, that is only a (very well-tested) conjecture.
2 replies →
> Using pointers to coexist multiple mutable references to the same variable is undefined behavior.
Yes, but which exact rule does it violate? What is the exact definition that says that it is UB? Tree Borrows is a proposal for exactly such a definition.
"code can do things like this" here means "you can write this code and compile it and run it and it will do something, and unless we have something like Tree Borrows we have no argument for why there would be anything wrong with this code".
You seem to have already accepted that we need something like Tree Borrows (i.e., we should say code like this is UB). This part of the paper is arguing why we need something like Tree Borrows. :)
> Unless I'm just misunderstanding the point they're trying to make here.
You misunderstand the word "can". Yes, you can, in unsafe code, do that. And yes, that is undefined behaviour ;)
https://play.rust-lang.org/?version=stable&mode=debug&editio...
You're already getting a lot of replies, and I don't want to pile on, but I think the clearest way to see the intent there is at the start of the following paragraph:
> Given that aliasing optimizations are something that the Rust compiler developers clearly want to support, we need some way of “ruling out” counterexamples like the one above from consideration.
I believe that's exactly the point: it's too easy to violate constraints like not allowing multiple mutable references. Unsafe is meant for cases where the validity of the code is difficult to prove with rust's lifetime analysis, but can be abused to do much more than that.
"can do things" in this case doesn't mean "is allowed to do things".
"Unsafe code allows to express the following, which is UB:"
Just realised that one of the author, Neven Villani, is Cédric Villani's (Fields Medal 2010) son. Apples don't fall far from the tree, indeed.
> Apples don't fall far from the tree, indeed.
And one could say that they borrow from the tree some of their qualities. Sorry, couldn't resist.
Hey, I used to have an office close to the dad's :)
That's before he went into politics, though.
This looks excellent. I will probably implement this model for my own language.
It can't be a dejavu. I keep seeing this post every 2-3 months...
The paper is years in the making. This is it finally being published.
[dead]
_Tree Borrows_
"I want my fuckin' money back."
"Hoom, hmm, let us not be hasty!"
"You got 48 hours to deliver or the sapling gets it, Treebeard."