Comment by pron

3 days ago

Eliminating undefined behaviour is a means to an end (reduces problematic bugs, but not all undefined behaviour is equally responsible to such bugs), and it's not a binary thing (virtually all programs need to interact with software written in languages that don't eliminate undefined behaviour, so clearly there's tolerance to the possibility of undefined behaviour).

Don't get me wrong - less undefined behaviour is better, but drawing a binary line between some and none makes for a convenient talking point, but isn't necessarily the sweet spot for the complicated and context-dependent series of tradeoffs that is software correctness.

By definition all the C++ Undefined Behaviour is unbounded. You may believe, and even you may have practical evidence for the compilers which happen to exist today, that in some cases the behaviour in fact bounded, but that's not what the language definition says and optimisers have a funny way of making fools of people who mistake Undefined for "Eh, it'll probably do what I meant".

It might seem as though incrementing a signed integer past its maximum can't be as problematic as a use after free even though both are Undefined Behaviour, but nah, in practice in real C++ compilers today they can both result in remote code execution.

There is a place for Unspecified results, for example having it be unspecified whether a particular arithmetic operation rounds up or down may loosen things up enough that much faster machine code is generated and well, the numbers are broadly correct still. But that's not what Undefined behaviour does.

  • Yes, that's what UB means - the program loses all meaning and any effect is possible - but the importance of a bug is much more than just its blast radius. We ultimately care not about the language semantics of a bug but its expected loss of value. This is also dependent on how frequently the bug appears and how easy it is to prevent or find and fix. Not all UBs are equal on that front.

    Furthermore, an unbounded blast radius isn't itself the direct problem. A bug that with some probability casues your program to crash and your disk to be deleted is far less dangerous than a bug that allows a remote attacker to relatively easily steal all your secrets. UBs also differ on that front.

    And again, virtually all programs are not provably without UB. For example, a Java program still interacts with an OS or with some native library that might suffer from a UB. So clearly we do tolerate some probability of UB, and we clearly do not think that eliminating any possibility of UB is worth any price.

    When a program is just code on the screen, it's just a mathematical object, and then it's easy to describe a UB - the loss of all program meaning - as the most catastrophic outcome. But software correctness goes beyond the relatively simple world of programming language semantics, and has to consider what happens when a program is running, at which point it is no longer a mathematical object but a physical one. If a remote attacker steals all our secrets, we don't care if it's a result of some bug in the program itself (due to UB or otherwise), in other software the program interacts with, some fault or weakness in the hardware, or human operator error. The probability of any of these things is never zero, and we have to balance the cost of addressing each of these possibilities.

    To give an example in the context of Carbon, we know that old code tends to suffer from fewer severe bugs than new code. So, if we want to reduce the probability of bugs, it is possible that it may be more worthwhile to invest - say, in terms of language complexity budget - in interop with C++ code than in eliminating every possible kind of UB, including those that are less likely to appear, sneak past testing, and cause an easily exploitable vulnerability.

    • > the most catastrophic outcome

      Nah, this is in the context of C++ so UB isn't the most catastrophic situation. Undefined Behaviour is a behaviour, which means we might well be able to avoid invoking the behaviour and then it's fine. For example if your C++ program has a use-after-free in the code invoked only by the "Add file" feature but otherwise works fine, we can just ensure operators never use "Add file" and the UB, no matter what it might be, won't happen.

      C++ has IFNDR, clauses which say this is "Ill-Formed, No Diagnostic Required". The program itself wasn't actually a valid C++ program, it has absolutely no defined meaning, there may be no indication of a problem from your compiler but alas it's entirely meaningless. This isn't behavioural, it's an invisible status of the entire program.

      4 replies →