← Back to context

Comment by loeg

4 years ago

> 203 is enough to make almost every line of code questionable. The result of this is that looking at a simple 3 line C program and being asked whether the program terminates is undecidable without knowing which compiler was used.

This is hyperbole to the point of being nonsensical.

> Null dereference for example is undefined behavior, and could cause a termination or not, depending on the implementation, even if it is known to be standards conforming to C11.

This sentence doesn't make any sense. If your C code has UB, it is wrong. The behavior of particular environments around certain UB is irrelevant to standards-conforming code, because standards-conforming code doesn't have UB.

> This is hyperbole to the point of being nonsensical.

I think you can only say this if you've never had aggressive compiler optimizations introduce security issues into perfectly reasonable-looking code.

Quiz, what's wrong with the following code?

    int buflen, untrusted;
    char buf[MAX];

    /* `untrusted` comes from an untrusted source */

    if (buflen + untrusted > MAX) {
        return -EINVAL;
    }

The answer of course is that integer overflow is undefined; so if buflen + untrusted is greater than INT_MAX, the compiler is allowed to do absolutely anything it wants; and making sure it's only allowed to do something sensible turns out to be extremely difficult.

EDIT For instance, in an earlier age, people might have done something like this:

    if (buflen + untrusted > MAX || buflen + untrusted < buflen)

But the second clause relies on overflow. The compiler is perfectly justified in saying, "Well, overflow is UB anyway, so if it happens, I'm allowed to not do anything; so I'll just make this code more efficient by removing that check entirely."

> If your C code has UB, it is wrong.

This goes against the sheer notion of UB. If some code was wrong, the standard would say it is not allowed and it would result in a compile error, or at least a runtime error. As it is, the language standards choose to leave it open almost as if to concede that the standard can’t cover every base. UB isn’t wrong, almost by definition. It’s just implementation specific, and that’s my point. We don’t have an overarching C language, we have a hundred or so C dialects.

  • One problem here is that correct code relies on valid inputs in order to avoid UB -- Undefined behaviour is a runtime property of a running program, rather than (necessarily) a static property of an isolated unit of code.

    In this way, UB is essentially the converse of Rust's `unsafe` -- we must assume that our caller won't pass in values that would trigger undefined behaviour, and we don't necessarily have the local context to be able to tell at runtime whether our behaviour is well-defined or not.

    There definitely are instances where local checks can avoid UB, but it's also perfectly possible to write a correct program where a change in one module causes UB to manifest via different module -- use after free is a classic here. So we can have two modules which in isolation couldn't be said to have any bugs, but which still exhibit UB when they interact with each other.

    And that's before we start getting into the processing of untrusted input.

    A C compiler -- and especially the optimiser -- assumes[1] that the conditions for provoking UB won't occur, while the Rust compiler (activate RESF[0]) mostly has defined behaviour that's either the same as common C compilers would give for a local UB case[2] in practice or have enough available context to prove that the UB case genuinely doesn't happen.

    [0] https://enet4.github.io/rust-tropes/rust-evangelism-strike-f...

    [1] Proof by appeal to authority: I was a compiler engineer, back in the day.

    [2] Signed integer wrap-around is the classic here: C assumes it can't happen, Rust assumes it might but is much less likely to encounter code where there's a question about it happening.

  • I always though that code with UB is wrong, and UB allows implementation to deal with it on its own way (it is allowed to ignore it, stop program, corrupt memory, delete hard drive contents...).

    So if your code has UB then it is wrong, one thing not specified in standard is exact consequences of that.

    (yes, in some hacks one may rely on UB behaving in some way in some circumstances - it will be hack)

    • Suppose it is wrong, though; that implies a good chunk of C code out there is wrong code. Yet it compiles and people are using it, which means that their code does not conform to the standard. Just as wrong math isn’t math at all, wrong C is not C. People are therefore writing code whose runtime characteristics are not defined by any standard. Thus it is not actually C, it’s whatever compiler they’re using’s language.

      2 replies →

  • There's "implementation-defined" behavior, and then there is "undefined behavior". I think you're conflating the two.

  • I still think undefined behavior is the wrong choice here. It should have been implementation-defined, like what happens if you bit shift a negative integer to the right. They could pick two's complement or trap on overflow or whatever is most convenient on their platform, but not just assume it will never happen.