← Back to context

Comment by ironmagma

4 years ago

> If your C code has UB, it is wrong.

This goes against the sheer notion of UB. If some code was wrong, the standard would say it is not allowed and it would result in a compile error, or at least a runtime error. As it is, the language standards choose to leave it open almost as if to concede that the standard can’t cover every base. UB isn’t wrong, almost by definition. It’s just implementation specific, and that’s my point. We don’t have an overarching C language, we have a hundred or so C dialects.

One problem here is that correct code relies on valid inputs in order to avoid UB -- Undefined behaviour is a runtime property of a running program, rather than (necessarily) a static property of an isolated unit of code.

In this way, UB is essentially the converse of Rust's `unsafe` -- we must assume that our caller won't pass in values that would trigger undefined behaviour, and we don't necessarily have the local context to be able to tell at runtime whether our behaviour is well-defined or not.

There definitely are instances where local checks can avoid UB, but it's also perfectly possible to write a correct program where a change in one module causes UB to manifest via different module -- use after free is a classic here. So we can have two modules which in isolation couldn't be said to have any bugs, but which still exhibit UB when they interact with each other.

And that's before we start getting into the processing of untrusted input.

A C compiler -- and especially the optimiser -- assumes[1] that the conditions for provoking UB won't occur, while the Rust compiler (activate RESF[0]) mostly has defined behaviour that's either the same as common C compilers would give for a local UB case[2] in practice or have enough available context to prove that the UB case genuinely doesn't happen.

[0] https://enet4.github.io/rust-tropes/rust-evangelism-strike-f...

[1] Proof by appeal to authority: I was a compiler engineer, back in the day.

[2] Signed integer wrap-around is the classic here: C assumes it can't happen, Rust assumes it might but is much less likely to encounter code where there's a question about it happening.

I always though that code with UB is wrong, and UB allows implementation to deal with it on its own way (it is allowed to ignore it, stop program, corrupt memory, delete hard drive contents...).

So if your code has UB then it is wrong, one thing not specified in standard is exact consequences of that.

(yes, in some hacks one may rely on UB behaving in some way in some circumstances - it will be hack)

  • Suppose it is wrong, though; that implies a good chunk of C code out there is wrong code. Yet it compiles and people are using it, which means that their code does not conform to the standard. Just as wrong math isn’t math at all, wrong C is not C. People are therefore writing code whose runtime characteristics are not defined by any standard. Thus it is not actually C, it’s whatever compiler they’re using’s language.

    • Working and usable program typically contains wrong code of various kinds.

      Nontrivial bugfree programs are extreme rarity.

      > wrong C is not C

      buggy C is still C, if on discovering undefined behavior people treat it as a bug - then it is just C program with some bugs in it.

      If on discovering undefined behavior people treat it acceptable people treat it differently "on my compiler it does XYZ, therefore I will knowingly do ABC" then it is becoming something else.

      1 reply →

There's "implementation-defined" behavior, and then there is "undefined behavior". I think you're conflating the two.

I still think undefined behavior is the wrong choice here. It should have been implementation-defined, like what happens if you bit shift a negative integer to the right. They could pick two's complement or trap on overflow or whatever is most convenient on their platform, but not just assume it will never happen.