Comment by Liquid_Fire

3 years ago

> If you're doing a check for UB, the reasonable thing, to me is to maintain that check.

The problem is that you need to do the check before you cause UB, not after, and here the check appears after. If you do the check before, the compiler will not touch it.

The compiler can't know that this code is part of a UB check (so it should leave it alone), whereas this other code here isn't a UB check but is just computation (so it should assume no UB and optimise it). It just optimises everything, and assumes you don't cause UB anywhere.

Now, I'm not defending this approach, but C works like this for performance and portability reasons. There are modern alternatives that give you most or all of the performance without all these traps.

>for performance and portability reasons

Is it more performant?

How would you do the check in the article in a more performant way?

Philosophically I'm not sure it's even possible. Sure you could do the check before the overflow but any way you slice it that calculation ultimately applies to something that is going to be UB so the compiler is free to optimise it out? Yes you can make it unrelated enough that the compiler doesn't realise. But really if the compiler can always assume you aren't going to overflow integers, then it should be able to optimise away 'stupid' questions like 'if I add X and y, would that be an overflow?'.

>The compiler can't know that this code is part of a UB check

If it doesn't know what the code is then it shouldn't be deleting it. It has just rearranged code that it knows is UB, it is now faced with a check on that UB. It could (and does) decide that can't possibly happen, because 'UB'. It could instead decide that it is UB and so doesn't know if this check is meaningful or not, and not delete the check, this to me is the original point of UB, C doesn't know whether your machine is 1s complement, 2s complement or 3s complement, it leaves it to the programmer to deal with the situation, if the programmer knows he's working on 2s complement machines that overflow predictably he can work on that assumption, the compiler isn't expected to know, but it should stay out of the way because the programmer does. The performance of c as I understood it is that overflow check is optional, you aren't forced to check. But you are required to ensure that the check is done if needed, or deal with the consequences.

Would you get rid of something you don't understand because you can't see it doing something useful. Or would you keep it because you don't know what you might break when you delete it? GCC in this case is deleting something it doesn't understand. Why is that not a bug?

  • > Sure you could do the check before the overflow but any way you slice it that calculation ultimately applies to something that is going to be UB so the compiler is free to optimise it out?

    No, if you never do the calculation it's not going to be UB.

      int8_t x = some_input();
      if (x > 10) return bad_value;
      else x *= 10;
    

    There is no UB here, because we never execute the multiplication in cases where it would have otherwise been UB. The compiler is not free to remove the check, because it can't prove that the value is not > 10.

    > It has just rearranged code that it knows is UB

    No - that's the problem. The compiler doesn't know that the code is UB, because this depends on the exact values at runtime, which the compiler doesn't know.

    In some limited cases it could perform data flow analysis and know for sure that it will be UB, but those cases are very limited. In general there is no way for to know. So there are three things it could do:

    A) Warn/error if there could possibly be UB. This would result in warnings in hundreds of thousands of pieces of legitimate code, where there are in fact guarantees about the value but the compiler can't prove or see it. It would require much more verbose code to work around these, or changing the language significantly. For example, you could represent this in the type system, or have annotations.

    B) Insert runtime checks for the UB. This would have a significant performance overhead, as there are lots of "innocent" operations in the language that, in the right circumstances, lead to UB. So we would bloat the code with a lot of branches, 99.999% of which will never ever be taken, filling up the instruction cache and branch predictor. You get something more like (the runtime behaviour of) Python or JavaScript. Or even C if you enable UBSan.

    C) Assume that the programmer has inserted these checks where they are needed, and omitted them where they are not. You get performance, but in exchange for that you are responsible for avoiding UB. This is what C chooses.

    > C doesn't know whether your machine is 1s complement, 2s complement or 3s complement, it leaves it to the programmer to deal with the situation, if the programmer knows he's working on 2s complement machines that overflow predictably he can work on that assumption, the compiler isn't expected to know, but it should stay out of the way because the programmer does

    This is mostly right, but with the caveat that you can't invoke UB. If you want to deal with whatever the underlying representation is, cast it to an unsigned type and then do whatever you want with it. The compiler will not mess with your unsigned arithmetic, because it's allowed to wrap around. But for signed types, you are promising to the compiler that you won't cause overflow. In exchange the compiler promises you fast signed arithmetic.

    This promise is part of the language, not part of GCC. If you removed that promise, you would have to pay the price in reduced performance.

    Could you have a C compiler that inserts these checks? Yes (see UBSan). But you would be throwing away performance - it would be slower than GCC/Clang/MSVC/etc. If you're writing performance-sensitive software, you are better off either ensuring you never trigger UB, or use another language like Rust. If performance is not so important, you are probably better off writing the thing in Go/JavaScript/whatever.

    • >No, if you never do the calculation it's not going to be UB.

        int8_t x = some_input();
        if (x > 10) return bad_value;
        else x *= 10
      

      In this simple case yes. But what if you don't know what you're going to multiply by? What if you can't say that X is a bad value?

      If you have: Long long x = ?; Long long y = ?;

        If (????); x *= y;
      

      I don't know the answer to this. I've looked online and the answers invoke UB. The best I can think of is a LUT of safe / unsafe combinations, but that isn't faster, and when you're at that point you may as well give up on the MUL hardware in your cpu, I'm not even sure how to safely calculate the LUT, I suppose you could iterate with additions subbing the current total from int_max and checking if that's bigger than the number you're about to add. But that's frankly stupid. And again you are basically checking if something is going to be UB which can't happen the compiler is therefore free to remove the check. Or do you roll your own data type with unsigned ints and a sign bit? But but then what's the point of having signed ints, and what happens to Cs speed. Or is there some bit twiddling you can do?

      >No - that's the problem. The compiler doesn't know that the code is UB

      Ok I should properly have said, code it can't prove isn't UB.

      If it can't say X + y isn't an overflow it shouldn't just assume it can't.

      If y is 1 and X is probably 9 it wouldn't be reasonable to assume the sum is 10.

      >C) Assume that the programmer has inserted these checks where they are needed, and omitted them where they are not. You get performance, but in exchange for that you are responsible for avoiding UB

      You get the performance by avoiding option B. I'm not even sure the programmer is responsible for avoiding UB? UB just doesn't give guarantees about what will happen. You should still be able to invoke it, and I would contend, expect the compiler to do something reasonable.

      2 replies →