← Back to context

Comment by pbsd

3 years ago

This has nothing to do with undefined behavior. Switch the code to unsigned integers, where overflow is perfectly defined as wrapping, and the result is exactly the same.

The compiler, to avoid the division, compares x * 0x1ff with 512 * 0xffff instead of x * 0x1ff / 0xffff with 512. This comparison is obviously bunk, since it doesn't take the uppermost bits of the multiplication into account. But so is the original comparison!

You see the same thing happen in Rust -- https://rust.godbolt.org/z/xf67rM77T -- the difference being that there's a second language-level bounds-check inserted at the lookup site.

The author took issue with the compiler removing the i >= 0 part. The compiler did so because it could infer it's true given x >= 0, the only way i < 0 could be true is via integer overflow.

I think the output of the Rust compiler is fine because it uses an unsigned comparison ("ja" as opposed to "jl") to implement the "if".

  • In the example, the crash occurred because the compiler removed the i<512 part because without overflow that condition is always true.

    • Despite the appearance (the printf says i=62183), the compiler is not removing the check i < 512. That's clear from the decompiled code.

      What actually happens is that 50000000*511 is 0x5F2E60F80, which is -219803776 when truncated to 32-bits and treated as signed. At the C abstract machine level this was UB, but at assembly level this is the value that is stored in the register and used for subsequent computations.

      The compiler then says "this must be positive because I had already checked 50000000 and it was". So it only performs a signed check -219803776 < 512*511, which passes.

      It's only inside the "if" that the code divides -219803776 by 65535. One could plausibly expect the result to be -3353, but the compiler decides to use an unsigned divide (again, it can do so because the input "must" be less than 2^31) and therefore it returns 61283.

      But the test that's being removed is i>=0, not i<512.

      2 replies →

It doesn’t matter what x is. Without prior undefined behavior, there is no way to justify “if (i >= 0 && i < sizeof(tab))” passing when (as demonstrated by the printf) i is not actually in that range.

Edit: Though, incidentally, the comparison does not work the way it was probably intended to work. In `i < sizeof(tab)`, `i` is converted to `size_t`, so an unsigned comparison is performed, making the `i >= 0` part redundant. But the result is the same as what was intended.

  • That is not how undefined behavior works in C (or C++).

    Effects of UB are not temporal or spacial limited to the place where undefined behavior happens.

    The moment you enter a compilation unit (assuming no link optimizations) with a state which at some point will run into undefined behavior all bets are of.

    EDIT: Yes, UB can "time travel". Compared to that ignoring an if condition iff the UB code was triggered is harmless. Similar it can also "split realities". E.g. a value produced by UB might at one place have the value 1 and at another place a completely different value. E.g. unsigned int overflow values might for an if condition have one value and for the print statment in the condition another and for the index operation again a different value.

    EDIT2: Which is why a lot of people which have proper understanding of C++ and don't have a sunken (learn C++) cost fallacy came to the conclusion that using C++ is a bad choice for most use-case.

    • > The moment you enter a compilation unit (assuming no link optimizations) with a state which at some point will run into undefined behavior all bets are of. [...] Yes, UB can "time travel"

      Close, but not quite. This is a common misconception in the reverse direction.

      Abstractly, what UB can do is performing the inverse of the preceding instructions, effectively making the abstract machine run in reverse. However, this is only equivalent to "time-traveling" until you get to the point of the last side effect (where "side effect" here refers to predefined operations in the standard that interact with the external world, such as I/O and volatile accesses), because only everything since that point can be optimized away under the as-if rule without altering the externally visible effects of the program.

      As a concrete, practical example, this means the following: if you do fflush(stdout); return INT_MAX + 1; the compiler cannot omit the fflush() call merely because the subsequent statement had undefined behavior. That is, the UB cannot time-travel to before the flush. What the program can do is to write garbage to the file afterward, or attempt to overwrite what you wrote in the file to revert it to its previous state, but the fflush() must still occur before anything wild happens. If nobody observes the in-between state, then the end result can look like time-travel, but if the system blocks on fflush() and the user terminates the program while it's blocked, there is no opportunity for UB.

      20 replies →

    • To hammer it home: UB isn't restricted to a variable having a funny value. Your C program is allowed to play Nethack on startup, if the compiler can prove that a few hours into your program, there would be UB.