“Implementation defined behaviour”: compiler author chooses, and documents the choice.
A lot of UB should be implementation defined behaviour instead; this would much better match programmers’ intuitions as they reason about their code - you can even see it in the comments of this post: it’s always things like “this hardware supports / doesn’t support unaligned accesses”, it’s never nasal demons.
That isn't true, for UB the compiler is allowed to assume the UB can never happen. For example if you dereference a pointer and only after check if it is NULL, the compiler can remove the NULL check, since it is clearly impossible (nevermind that you might be on a microcontroller where NULL is a valid address).
The fallout of this are quite large!
If behaviour is implementation defined the compiler has to stick to one consistent behaviour. No such need for UB, you can get different behaviour bu changing unrelated code, by changing between debug and release or just because of what garbage happened to be on the stack.
Since the compiler is allowed to assume the UB doesn't happen it will also sometimes look like the compiler miscompiled your code elsewhere, but what actually happened was some inlining followed by extrapolating "this can never happen".
UB is often surprising: I have seen unaligned loads crash on x86 due to it bring UB in C (even though x86 is generally fine with it). But once a newer compiler decided that it was fine to vectorise that code (since it clearly aligned) the CPU was no longer happy with it.
You're missing the point. Volatile forces two loads of a value that may have changed in the middle. So the value of "x" may depend on the time/order of load.
Which is, if I understand correctly, the entire point of volatile. Don't use it if you don't want that behavior.
And in fact, in the example given, if there is something (another thread or whatever) that can change the value of x, then you don't know what either number will be. Well, in that circumstance, without volatile, it may print the same number both times, but you still don't know what the number will be (unless the read gets optimized away entirely).
Why is that missing the point? Loading it twice, possibly with different values, is the intended behavior. It's only undefined because the C spec doesn't specify the order of the loads (unlike most other languages which have a perfectly well-defined order for side effects in a single expression).
Why? Just for this edge case? It could be faster and/or allow smaller code size to allow this to be undefined.
Undefined is also different from "depends on the compiler", because which behavior is chosen can even depend on the circumstances, whatever code appears before and/or after it.
That said, UB in code, such as this example of ordering of reads of volatile parameters being undefined, does not automatically mean that code that uses it is bad. It may very well be that the function being called doesn't misbehave either way.
That’s the point of the whole article. It’s not worth the speed gain to have a language that nobody can safely use because you can’t really prevent UB when you write it.
> It may very well be that the function being called doesn't misbehave either way.
The function being good or bad has nothing to do with the UB. The UB occurs before the function is called.
There is no uninitialized variable, I explicitly initialized it to 5.
And yes indeed, C could do what Rust does and define the order of evaluation for function arguments.
If the argument expressions are indeed side-effect-less, the compiler can always make use of the "as-if" rule and legally reorder the computation anyway, for example to alleviate register pressure.
“Implementation defined behaviour”: compiler author chooses, and documents the choice.
A lot of UB should be implementation defined behaviour instead; this would much better match programmers’ intuitions as they reason about their code - you can even see it in the comments of this post: it’s always things like “this hardware supports / doesn’t support unaligned accesses”, it’s never nasal demons.
I told someone at a conference that UB actually means "implementation-defined, no documentation required". He started to refute me and then stopped.
That isn't true, for UB the compiler is allowed to assume the UB can never happen. For example if you dereference a pointer and only after check if it is NULL, the compiler can remove the NULL check, since it is clearly impossible (nevermind that you might be on a microcontroller where NULL is a valid address).
The fallout of this are quite large! If behaviour is implementation defined the compiler has to stick to one consistent behaviour. No such need for UB, you can get different behaviour bu changing unrelated code, by changing between debug and release or just because of what garbage happened to be on the stack.
Since the compiler is allowed to assume the UB doesn't happen it will also sometimes look like the compiler miscompiled your code elsewhere, but what actually happened was some inlining followed by extrapolating "this can never happen".
UB is often surprising: I have seen unaligned loads crash on x86 due to it bring UB in C (even though x86 is generally fine with it). But once a newer compiler decided that it was fine to vectorise that code (since it clearly aligned) the CPU was no longer happy with it.
Except that UB doesn't mean that. UB means "the developer must never write this".
2 replies →
Print x twice. Not all “side effects” care about order.
Better yet, define an order for parameter evaluation.
You're missing the point. Volatile forces two loads of a value that may have changed in the middle. So the value of "x" may depend on the time/order of load.
Which is, if I understand correctly, the entire point of volatile. Don't use it if you don't want that behavior.
And in fact, in the example given, if there is something (another thread or whatever) that can change the value of x, then you don't know what either number will be. Well, in that circumstance, without volatile, it may print the same number both times, but you still don't know what the number will be (unless the read gets optimized away entirely).
4 replies →
Why is that missing the point? Loading it twice, possibly with different values, is the intended behavior. It's only undefined because the C spec doesn't specify the order of the loads (unlike most other languages which have a perfectly well-defined order for side effects in a single expression).
2 replies →
Couldn’t you just define that function arguments are evaluated left to right?
Or just throw an error.
Why? Just for this edge case? It could be faster and/or allow smaller code size to allow this to be undefined.
Undefined is also different from "depends on the compiler", because which behavior is chosen can even depend on the circumstances, whatever code appears before and/or after it.
That said, UB in code, such as this example of ordering of reads of volatile parameters being undefined, does not automatically mean that code that uses it is bad. It may very well be that the function being called doesn't misbehave either way.
That’s the point of the whole article. It’s not worth the speed gain to have a language that nobody can safely use because you can’t really prevent UB when you write it.
> It may very well be that the function being called doesn't misbehave either way.
The function being good or bad has nothing to do with the UB. The UB occurs before the function is called.
I meant reading the uninitialized variable
There is no uninitialized variable, I explicitly initialized it to 5.
And yes indeed, C could do what Rust does and define the order of evaluation for function arguments.
If the argument expressions are indeed side-effect-less, the compiler can always make use of the "as-if" rule and legally reorder the computation anyway, for example to alleviate register pressure.
HCF
I have good news about what UB allows
What is that?
A fictitious assembly instruction (and pretty good TV series).
https://en.wikipedia.org/wiki/Halt_and_Catch_Fire_(computing...
Halt and Catch Fire
Compilation error
It’s hard to detect all UB at compile time
It’s harder depending on the language, which is clearly the point.