← Back to context

Comment by dataflow

3 years ago

> The moment you enter a compilation unit (assuming no link optimizations) with a state which at some point will run into undefined behavior all bets are of. [...] Yes, UB can "time travel"

Close, but not quite. This is a common misconception in the reverse direction.

Abstractly, what UB can do is performing the inverse of the preceding instructions, effectively making the abstract machine run in reverse. However, this is only equivalent to "time-traveling" until you get to the point of the last side effect (where "side effect" here refers to predefined operations in the standard that interact with the external world, such as I/O and volatile accesses), because only everything since that point can be optimized away under the as-if rule without altering the externally visible effects of the program.

As a concrete, practical example, this means the following: if you do fflush(stdout); return INT_MAX + 1; the compiler cannot omit the fflush() call merely because the subsequent statement had undefined behavior. That is, the UB cannot time-travel to before the flush. What the program can do is to write garbage to the file afterward, or attempt to overwrite what you wrote in the file to revert it to its previous state, but the fflush() must still occur before anything wild happens. If nobody observes the in-between state, then the end result can look like time-travel, but if the system blocks on fflush() and the user terminates the program while it's blocked, there is no opportunity for UB.

The program can logically undo the call to fflush, too. Mainly by not dispatching it at all–UB is a global program attribute, at least currently. (People have made proposals to change this, but I don't think they have gone anywhere.)

  • No, it cannot, and UB is not a global program property. The C standard defines valid program executions according to the behaviors of the abstract machine. UB is a property of an execution of the program given some inputs.

    • Yes, sorry for not being precise: UB applies to executions. When I said "global" I meant global over that entire execution, so if your path ends up hitting undefined behavior it can go back and logically undo its entire execution, including parts which it shared with a well-defined execution or where you'd generally expect side effects to be placed.

      13 replies →

Something I should add here in hindsight is that I've been rather sloppy in this discussion with a few details, and perhaps they're worth clarifying. For example, despite me using them interchangeably, "observable behavior" is not the same thing as "side effects", and you really have to refer to the standard and your implementation to see what constitutes observable behavior. For example, fflush() may in fact be elidable if the compiler can prove the file is unbuffered (and it wouldn't even need UB for that). Similarly, if the compiler can prove fflush() has no observable behavior (i.e. it is guaranteed to return without raising signals, terminating the program, etc.) then it may be able to elide the call in the UB case as well. In practice this isn't usually possible to guarantee given fflush() performs an opaque system call, but it may be more possible in a freestanding implementation than in a hosted one.

Ultimately, my point here wasn't about fflush() or even about the specifics of what exactly constitutes observable behavior in the abstract machine. (I do recall writes to volatile variables was among them, but you'd have to check all of them to be sure.) Rather, my basic point was the fact (tautology?) that any interactions with the external world that affect the program's observable behavior necessarily must be allowed to happen before the program can "know" for certain that the execution path will trigger UB—which by definition isn't possible when one of the intervening operations is an opaque call.

> if you do fflush(stdout); return INT_MAX + 1; the compiler cannot omit the fflush() call merely because the subsequent statement had undefined behavior

False! The expression (INT_MAX + 1) has no side effect (assuming no UB), so according to the rules of the C abstract machine, the compiler is allowed to hoist this calculation above the fflush(). If you run this on a machine that traps on integer overflow (which is allowed behavior), the process could crash before the fflush() is executed. Remember, everyone: With UB, anything can happen.