Comment by jcranmer
1 day ago
In a compiler, you essentially need the ability to trace all the uses of an address, at least in the easy cases. Converting a pointer to an integer (or vice versa) isn't really a deal-breaker; it's essentially the same thing as passing (or receiving) a pointer to an unknown external function: the pointer escapes, whelp, nothing more we can do in that case for the most part.
But converting an integer to a pointer creates a problem if you allow that pointer to point to anything--it breaks all of the optimizations that assumed they could trace all of the uses of an address. So you need something like provenance to say that certain back-conversions are illegal. The most permissive model is a no-address-taken model (you can't forge a pointer to a variable whose address was never taken). But most compilers opt instead for a data-dependency-based model: essentially, even integer-based arithmetic of addresses aren't allowed to violate out-of-bounds at the point of dereference. Or at least, they claim to--the documentation for both gcc and llvm have this claim, but both have miscompilation bugs because they don't actually allow this.
The proposal for pointer provenance in C essentially looks at how compilers generally implement things and suggests a model that's closer to their actual implementation: pointer-to-integer exposes the address such that any integer-to-pointer can point to it. Note this is more permissive than the claimed models of compilers today--you're explicitly able to violate out-of-bounds rules here, so long as both objects have had their addresses exposed. There's some resistance to this because adhering to this model also breaks other optimizations (for example, (void*)(uintptr_t)x is not the same as x).
As a practical matter, pointer provenance isn't that big of a deal. It's not hard to come up with examples that illustrate behaviors that cause miscompilation or are undefined specifically because of pointer provenance. But I'm not aware of any application code that was actually miscompiled because the compiler implemented its provenance model incorrectly. The issue gets trickier as you move into systems code that exists somewhat outside the C object model, but even then, most of the relevant code can ignore their living outside the object model since resulting miscompiles are prevented by inherent optimization barriers anyways (note that to get a miscompile, you generally have to simultaneously forge the object's address, have the object's address be known to the compiler already, and have the compiler think the object's address wasn't exposed by other means).
No comments yet
Contribute on Hacker News ↗