Comment by SkiFire13
1 day ago
> I have an execution environment, Wasm, where doing this is pretty well defined, in fact. So if I want to read the memory at address 12345, which is within bounds of the linear memory (and there's a builtin to make sure), why should it be undefined behavior?
How would you define it? Especially in a way that is consistent with the rest of the language and allows common optimizations (remember that C supports variables, which may or may not be stored in memory)?
Just read whatever is at address 12345 of the linear memory. Doesn't matter what that is. If it's an object, if it was malloc'ed, if it's the "C stack", a "global".
It's the only way to interpret *(uint64_t*)(12345) when the standard says that a integer-to-pointer conversion is "intended to be consistent with the addressing structure of the execution environment".
There exists an instruction to do that load in Wasm, there's a builtin to check that 12345 points to addressable memory, the load is valid at the assembly level, the standard says the implementation should define this to be consistent with the addressing structure of the execution environment, why the heck are we playing games and allowing the compiler to say, "nope, that's not valid, so your entire program is invalid, and we can do what ever we want, no diagnostic required"?
If a newer version of that value is also stored in a register and not yet flushed to memory, should the compiler know to insert that flush for your or is reading a stale value ok?
For what it’s worth there’s a reason you’re supposed to do this kind of access through memcpy, not by dereferencing made up pointers.
> There exists an instruction to do that load in Wasm, there's a builtin to check that 12345 points to addressable memory, the load is valid at the assembly level, the standard says the implementation should define this to be consistent with the addressing structure of the execution environment, why the heck are we playing games and allowing the compiler to say, "nope, that's not valid, so your entire program is invalid, and we can do what ever we want, no diagnostic required"?
Because the language standard is defined to target a virtual machine as output, not any given implementation. That virtual machine is then implemented on various platforms, but the capabilities of the underlying system aren’t directly accessible - they are only there to implement the C virtual machine. That’s why C can target so many different target machines.
> If a newer version of that value is also stored in a register and not yet flushed to memory, should the compiler know to insert that flush for your or is reading a stale value ok?
Any value would be OK. There are aliasing rules to follow, and it's OK if those crater performance when you start using integer-to-pointer conversions a lot. Is that a problem? But in this instance, assume I don't even care.
> For what it’s worth there’s a reason you’re supposed to do this kind of access through memcpy, not by dereferencing made up pointers.
Then why allow integers to be converted to pointers at all, say it's implementation defined, and meant to represent the addressing structure of the environment?
> Because the language standard is defined to target a virtual machine as output, not any given implementation …
Again, not taking about this being portable. The standard says it's implementation defined, and meant to match the addressing structure of the platform. I offered a specific platform where all of this has a specific meaning, that's all.
What's the point of specifying this, if you're then going to say _actually_ because of aliasing it's “undefined” and as soon as that magic word appears, a smart compiler that can prove it at compile time decides this code can't possibly be reached, and deletes the entire function?
What good does this bring us if it means clang can't be used to target platforms where direct memory access is a thing?
1 reply →