← Back to context

Comment by Liquid_Fire

3 years ago

> Sure you could do the check before the overflow but any way you slice it that calculation ultimately applies to something that is going to be UB so the compiler is free to optimise it out?

No, if you never do the calculation it's not going to be UB.

  int8_t x = some_input();
  if (x > 10) return bad_value;
  else x *= 10;

There is no UB here, because we never execute the multiplication in cases where it would have otherwise been UB. The compiler is not free to remove the check, because it can't prove that the value is not > 10.

> It has just rearranged code that it knows is UB

No - that's the problem. The compiler doesn't know that the code is UB, because this depends on the exact values at runtime, which the compiler doesn't know.

In some limited cases it could perform data flow analysis and know for sure that it will be UB, but those cases are very limited. In general there is no way for to know. So there are three things it could do:

A) Warn/error if there could possibly be UB. This would result in warnings in hundreds of thousands of pieces of legitimate code, where there are in fact guarantees about the value but the compiler can't prove or see it. It would require much more verbose code to work around these, or changing the language significantly. For example, you could represent this in the type system, or have annotations.

B) Insert runtime checks for the UB. This would have a significant performance overhead, as there are lots of "innocent" operations in the language that, in the right circumstances, lead to UB. So we would bloat the code with a lot of branches, 99.999% of which will never ever be taken, filling up the instruction cache and branch predictor. You get something more like (the runtime behaviour of) Python or JavaScript. Or even C if you enable UBSan.

C) Assume that the programmer has inserted these checks where they are needed, and omitted them where they are not. You get performance, but in exchange for that you are responsible for avoiding UB. This is what C chooses.

> C doesn't know whether your machine is 1s complement, 2s complement or 3s complement, it leaves it to the programmer to deal with the situation, if the programmer knows he's working on 2s complement machines that overflow predictably he can work on that assumption, the compiler isn't expected to know, but it should stay out of the way because the programmer does

This is mostly right, but with the caveat that you can't invoke UB. If you want to deal with whatever the underlying representation is, cast it to an unsigned type and then do whatever you want with it. The compiler will not mess with your unsigned arithmetic, because it's allowed to wrap around. But for signed types, you are promising to the compiler that you won't cause overflow. In exchange the compiler promises you fast signed arithmetic.

This promise is part of the language, not part of GCC. If you removed that promise, you would have to pay the price in reduced performance.

Could you have a C compiler that inserts these checks? Yes (see UBSan). But you would be throwing away performance - it would be slower than GCC/Clang/MSVC/etc. If you're writing performance-sensitive software, you are better off either ensuring you never trigger UB, or use another language like Rust. If performance is not so important, you are probably better off writing the thing in Go/JavaScript/whatever.

>No, if you never do the calculation it's not going to be UB.

  int8_t x = some_input();
  if (x > 10) return bad_value;
  else x *= 10

In this simple case yes. But what if you don't know what you're going to multiply by? What if you can't say that X is a bad value?

If you have: Long long x = ?; Long long y = ?;

  If (????); x *= y;

I don't know the answer to this. I've looked online and the answers invoke UB. The best I can think of is a LUT of safe / unsafe combinations, but that isn't faster, and when you're at that point you may as well give up on the MUL hardware in your cpu, I'm not even sure how to safely calculate the LUT, I suppose you could iterate with additions subbing the current total from int_max and checking if that's bigger than the number you're about to add. But that's frankly stupid. And again you are basically checking if something is going to be UB which can't happen the compiler is therefore free to remove the check. Or do you roll your own data type with unsigned ints and a sign bit? But but then what's the point of having signed ints, and what happens to Cs speed. Or is there some bit twiddling you can do?

>No - that's the problem. The compiler doesn't know that the code is UB

Ok I should properly have said, code it can't prove isn't UB.

If it can't say X + y isn't an overflow it shouldn't just assume it can't.

If y is 1 and X is probably 9 it wouldn't be reasonable to assume the sum is 10.

>C) Assume that the programmer has inserted these checks where they are needed, and omitted them where they are not. You get performance, but in exchange for that you are responsible for avoiding UB

You get the performance by avoiding option B. I'm not even sure the programmer is responsible for avoiding UB? UB just doesn't give guarantees about what will happen. You should still be able to invoke it, and I would contend, expect the compiler to do something reasonable.

  • It is tedious but possible to check for overflow before multiplying signed integers.

        long long x = (...);
        long long y = (...);
        long long z;
        
        
        // Portable
        bool ok = x == 0 || y == 0;
        if (!ok) {
            long long a = x > 0 ? x : -x;
            long long b = y < 0 ? y : -y;
            if ((x > 0) == (y > 0))
                ok = -LONG_LONG_MAX / a <= b;
            else
                ok = LONG_LONG_MIN / a <= b;
        }
        if (ok)
            z = x * y;
        
        
        // Compiler-specific
        bool ok = !__builtin_smulll_overflow(x, y, &z);
    

    https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins...

    • Thanks for that.

      Slightly worrying that I didn't come across this or a variation in my searches