← Back to context

Comment by comex

3 years ago

It doesn’t matter what x is. Without prior undefined behavior, there is no way to justify “if (i >= 0 && i < sizeof(tab))” passing when (as demonstrated by the printf) i is not actually in that range.

Edit: Though, incidentally, the comparison does not work the way it was probably intended to work. In `i < sizeof(tab)`, `i` is converted to `size_t`, so an unsigned comparison is performed, making the `i >= 0` part redundant. But the result is the same as what was intended.

That is not how undefined behavior works in C (or C++).

Effects of UB are not temporal or spacial limited to the place where undefined behavior happens.

The moment you enter a compilation unit (assuming no link optimizations) with a state which at some point will run into undefined behavior all bets are of.

EDIT: Yes, UB can "time travel". Compared to that ignoring an if condition iff the UB code was triggered is harmless. Similar it can also "split realities". E.g. a value produced by UB might at one place have the value 1 and at another place a completely different value. E.g. unsigned int overflow values might for an if condition have one value and for the print statment in the condition another and for the index operation again a different value.

EDIT2: Which is why a lot of people which have proper understanding of C++ and don't have a sunken (learn C++) cost fallacy came to the conclusion that using C++ is a bad choice for most use-case.

  • > The moment you enter a compilation unit (assuming no link optimizations) with a state which at some point will run into undefined behavior all bets are of. [...] Yes, UB can "time travel"

    Close, but not quite. This is a common misconception in the reverse direction.

    Abstractly, what UB can do is performing the inverse of the preceding instructions, effectively making the abstract machine run in reverse. However, this is only equivalent to "time-traveling" until you get to the point of the last side effect (where "side effect" here refers to predefined operations in the standard that interact with the external world, such as I/O and volatile accesses), because only everything since that point can be optimized away under the as-if rule without altering the externally visible effects of the program.

    As a concrete, practical example, this means the following: if you do fflush(stdout); return INT_MAX + 1; the compiler cannot omit the fflush() call merely because the subsequent statement had undefined behavior. That is, the UB cannot time-travel to before the flush. What the program can do is to write garbage to the file afterward, or attempt to overwrite what you wrote in the file to revert it to its previous state, but the fflush() must still occur before anything wild happens. If nobody observes the in-between state, then the end result can look like time-travel, but if the system blocks on fflush() and the user terminates the program while it's blocked, there is no opportunity for UB.

    • The program can logically undo the call to fflush, too. Mainly by not dispatching it at all–UB is a global program attribute, at least currently. (People have made proposals to change this, but I don't think they have gone anywhere.)

      16 replies →

    • Something I should add here in hindsight is that I've been rather sloppy in this discussion with a few details, and perhaps they're worth clarifying. For example, despite me using them interchangeably, "observable behavior" is not the same thing as "side effects", and you really have to refer to the standard and your implementation to see what constitutes observable behavior. For example, fflush() may in fact be elidable if the compiler can prove the file is unbuffered (and it wouldn't even need UB for that). Similarly, if the compiler can prove fflush() has no observable behavior (i.e. it is guaranteed to return without raising signals, terminating the program, etc.) then it may be able to elide the call in the UB case as well. In practice this isn't usually possible to guarantee given fflush() performs an opaque system call, but it may be more possible in a freestanding implementation than in a hosted one.

      Ultimately, my point here wasn't about fflush() or even about the specifics of what exactly constitutes observable behavior in the abstract machine. (I do recall writes to volatile variables was among them, but you'd have to check all of them to be sure.) Rather, my basic point was the fact (tautology?) that any interactions with the external world that affect the program's observable behavior necessarily must be allowed to happen before the program can "know" for certain that the execution path will trigger UB—which by definition isn't possible when one of the intervening operations is an opaque call.

    • > if you do fflush(stdout); return INT_MAX + 1; the compiler cannot omit the fflush() call merely because the subsequent statement had undefined behavior

      False! The expression (INT_MAX + 1) has no side effect (assuming no UB), so according to the rules of the C abstract machine, the compiler is allowed to hoist this calculation above the fflush(). If you run this on a machine that traps on integer overflow (which is allowed behavior), the process could crash before the fflush() is executed. Remember, everyone: With UB, anything can happen.

  • To hammer it home: UB isn't restricted to a variable having a funny value. Your C program is allowed to play Nethack on startup, if the compiler can prove that a few hours into your program, there would be UB.