Comment by motorest

1 year ago

> Your framing of a compiler exploiting UB in programs to gain performance, has an undeserved negative connotation. The fact is, programs are mathematical structures/arguments, and if any single step in the program code or execution is wrong, no matter how small, it can render the whole program invalid.

You're failing to understand the problem domain, and consequently you're oblivious to how UB is actually a solution to problems.

There are two sides to UB: the one which is associated with erroneous programs, because clueless developers unwittingly do things that the standards explicitly states that lead to unknown and unpredictable behavior, and the one which leads to valid programs, because developers knowingly adopted an implementation that specifies exactly what behavior they should expect from doing things that the standards specify as UB.

Somehow, those who mindlessly criticize UB only parrot the simplistic take on UB, the "nasal demons" blurb. They don't even stop to think about what is undefined behavior and why would a programming language specification purposely leave specific behavior as undefined instead of unspecified or even implementation-defined. They do not understand what they are discussing and don't invest any moment trying to understand why things are the way they are, and what problems are solved by them. The just parrot cliches.

14 comments

motorest

cowboylowrez 1 year ago

from the ubc.pdf paper linked in this thread.

    int d[16];
    int SATD (void)
    {
    int satd = 0, dd, k;
    for (dd=d[k=0]; k<16; dd=d[++k]) {
    satd += (dd < 0 ? -dd : dd);
    }
    return satd;
    }

This was “optimized” by a pre-release of gcc-4.8 into the following infinite loop: SATD: .L2: jmp .L2

(simply because k<16 is always true because k is used as an index to an array with a known size)

I mean thats just sort of nuts, how do you loop over an array then in an UB free manner? The paper referred to this situation being remediated:

"The GCC maintainers subsequently disabled this optimization for the case occuring in SPEC"

I try to keep up with the UB thing, while for current code I just use o0 because its fast enough and apparently allows me to keep an array index in bounds. Reading about this leaves me thinking that some of this UB criticism might not be so mindless.

tyg13 1 year ago
Leaving aside the fact that that code reads an array out-of-bounds (which is not a trivial security issue) that's a ridiculously obtuse way to write that code. For loop conditions should be almost always be expressed in terms of their induction variable. A much cleaner and safe version is
int d[16]; int SATD (void) { int satd = 0, k = 0; for (k = 0; k < 16; ++k) satd += d[k] < 0 ? -d[k] : d[k]; return satd; }
- cowboylowrez 1 year ago
  
  I don't understand your logic though, according to my obviously inadequate understanding of UB, "k < 16" can never be false because its used as an array index for d[16]??? is the critical difference here that the array access happens outside of the "for" expression?
  
  1 reply →
dapperdrake 1 year ago
After thinking about it and reading the other comments, the for loop needs to be changed (k<15), because ++k sets k to 16 in the last step of the version in the parent comment.
This one is nasty.
And it will still cause trouble close to arrays if size INT_MAX.
Fun times.
- nayuki 1 year ago
  
  The problem isn't ++k per se. The problem is the expression d[++k], which immediately reads d[16] before realizing that `k < 16` is false. Look at your sibling comments for explanations and corrections: https://news.ycombinator.com/item?id=43553519
  > And it will still cause trouble close to arrays if size INT_MAX.
  Fun fact, `for (int i = 0; i <= INT_MAX; i++) {}` is undefined behavior in C/C++ but a well-defined infinite loop in Java.
  
  1 reply →
nayuki 1 year ago
Reference: https://c9x.me/compile/bib/ubc.pdf#page=4
Both the parent comment and the referenced paper fail to mention the out-of-bounds access of d[16]. At best, the paper says:
> The compiler assumed that no out-of-bounds access to d would happen, and from that derived that k is at most 15 after the access
Here is my analysis. By unrolling the loop and tracing the statements and values, we get:
k = 0; dd = d[k]; k is 0; k < 16 is true; loop body; ++k; k is 1; dd = d[k]; k is 1; k < 16 is true; loop body; ++k; k is 2; dd = d[k]; k is 2; k < 16 is true; loop body; ++k; k is 3; dd = d[k]; k is 3; k < 16 is true; loop body; ++k; k is 4; dd = d[k]; k is 4; k < 16 is true; loop body; ++k; k is 5; dd = d[k]; k is 5; k < 16 is true; loop body; ++k; k is 6; dd = d[k]; k is 6; k < 16 is true; loop body; ++k; k is 7; dd = d[k]; k is 7; k < 16 is true; loop body; ++k; k is 8; dd = d[k]; k is 8; k < 16 is true; loop body; ++k; k is 9; dd = d[k]; k is 9; k < 16 is true; loop body; ++k; k is 10; dd = d[k]; k is 10; k < 16 is true; loop body; ++k; k is 11; dd = d[k]; k is 11; k < 16 is true; loop body; ++k; k is 12; dd = d[k]; k is 12; k < 16 is true; loop body; ++k; k is 13; dd = d[k]; k is 13; k < 16 is true; loop body; ++k; k is 14; dd = d[k]; k is 14; k < 16 is true; loop body; ++k; k is 15; dd = d[k]; k is 15; k < 16 is true; loop body; ++k; k is 16; dd = d[k]; OUT OF BOUNDS!
As long as we enter the loop, the loop must eventually execute undefined behavior. Furthermore, every instance of testing `k < 16` is true before we hit UB. Therefore it can be simplified to true without loss of functionality, because after we hit UB, we are allowed to do absolutely anything. In my ancestor post where I said that any mistake, no matter how small, can have unbounded consequences, I fully mean it and believe it.
Please stop blaming the compiler. The problem is buggy code. Either fix the code, or fix the language specification so that wild reads either return an arbitrary value or crashes cleanly at that instruction.
Note that we cannot change the spec to give definite behavior to writing out of bounds, because it is always possible to overwrite something critical like a return address or an instruction, and then it is literally undefined behavior and anything can happen.
> I mean thats just sort of nuts, how do you loop over an array then in an UB free manner?
The code is significantly transformed, but the nasty behavior can be prevented by designing code that does not read out of bounds! The trick is that the test `k < 16` must be false before any attempt to read/write `d[k]`. Which 99.99% of programmers get right, especially by writing a loop in the standard way and not in the obtuse way demonstrated in the referenced code. The obvious and correct implementation is:
for (int k = 0; k < 16; k++) { int dd = d[k]; satd += dd < 0 ? -dd : dd; }
The fact that the SPEC code chose to load `d[k]` before checking that `k` is still in bounds is an overly clever, counterproductive "jumping the gun" tactic. Putting assignment statements into indexing expressions is also needless obfuscation (which I untangled in the unrolled analysis).
- cowboylowrez 1 year ago
  
  according to my understanding UB mechanics says "k < 16" is always true because its used as an index into d[16]. obviously my understanding is wrong because of your correct code allows it, so I'm just looking for the takeaway here. why is yours not UB when original is? Is there some magic going on that the compiler knows that the value 16 will be used as an index?
  is the compiler doing a loop unroll and study that parallels your analysis?
  aside from trying to understand this, it also seems a bit nuts that these c guys remediated that behavior by allowing an out of bounds index access with their change (?). so are they saying somehow that in this one case, the out of bounds should be allowed?
  
  2 replies →

rollcat 1 year ago

Perhaps I'm spoiled by ever so slightly higher-level languages, but it seems your entire point is that if a program is ever so slightly incorrect, the programmer (and/or the end user) should suffer all of the consequences.

From where I stand, compilers are tools to aid the programmer. We invented them, because we found out that it was more productive than writing machine code by hand[1]. If an off-by-one error or a null pointer dereference[2] in a trivial program can invoke time travel several frames up the call stack[3], it isn't just missing the entire point of having a compiler - it can drive people insane.

[1]: https://en.wikipedia.org/wiki/Grace_Hopper#UNIVAC

[2]: https://en.wikipedia.org/wiki/Tony_Hoare#Research_and_career

[3]: https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

As far as I can tell, no popular language created in the past 30 years (including those with official specs and multiple implementations) makes heavy use of UB.

nayuki 1 year ago
> no popular language created in the past 30 years makes heavy use of UB
Yeah, I'm happy that Rust's list is relatively short, with approximately 10 items on it: https://doc.rust-lang.org/reference/behavior-considered-unde...
Also relevant reading: https://doc.rust-lang.org/nomicon/what-unsafe-does.html
- steveklabnik 1 year ago
  
  That list for Rust is not necessarily comprehensive. What matters more than the number is the segregation; you can’t cause UB from safe rust, only unsafe rust.