← Back to context

Comment by vgatherps

3 years ago

Somewhat unfortunately this is valid behavior according to the standard. Having to go through walls of text in a standard to prevent the compiler from deleting your security checks because of how you multiplied two integers seems a bit silly.

Having said that I’m of split feelings here. I work in a very performance sensitive industry and on one hand welcomes the ability of compilers to use the knowledge that certain things “can’t happen” to optimize, without me having to always do these optimizations be hand.

On the other hand, there seem to be so many cases like this one where the “undefined behavior code deleter goes brrrrrr” really overextends its usefulness. The “lalalalala standard say I can do this can’t hear you” finger in your ears attitude from compiler maintainers doesn’t help at all either.

I understand that the way much of this works, propagating “poison/impossible values”, can hide the root cause, so you can’t just say “please do the good undefined behavior optimizations but not the bad, so there’s no easy answer. The outcome in the blog post doesn’t feel like a local optimum though, and it’s not the only place I’ve felt that your options are “potentially slow code” or “pray you were perfect enough to not have your program deleted”

The problem here is that the thing that "can't happen" isn't actually something that can't happen, it's something that isn't allowed to happen according to a many-hundred-page document that approximately nobody reads. It's not something that can be optimised because the compiler can prove it cannot happen, it is allowed to be optimised because the standard says "dear programmer, if you ever make this happen, god help you".

  • I think this view is slightly unfair. I think of UB as the compiler saying "when you promised this thing wouldn't happen, I took you at your word. If bad things happen because you lied, they're your fault, not mine."

    • Lying requires intent. This was a mistake, something that humans are well-known for making, and if the compiler is designed to assume otherwise, it borders on useless in the real world.

      23 replies →

    • I don't think this is a fair comparison as this is all based on implicit inferences by the compiler.

      If the programmer had specifically invoked the "__assert_valid_pointer(p)" standard function (which does not exists) to promise the compile that the pointer was valid then it would be fine.

      The problem is that there are a lot of places where the compiler makes these assumptions.

    • This is a good positive model of UB, fulfilling the compiler assumptions.

  • Even if someone would read all those pages, constraining ourselves to ISO C only, no way that after an year they would still remeber the about 200 UB cases that are documented there.

    Which is why everyone should adopt static analysis tooling and enable all the warnings that are related to UB, pointer and casts misuses.

    Many think they know better, it is like those that think builders don't need protection gear at a construction site, it is stuff only for the weak.

    • I think implicitly compiler-added runtime check are a more robust and reliable solution than static analysis. For example for pointer dereferences the compiler should could 0-offset dummy load if the load is not guaranteed to be within a page of the pointer. Or adding abort-on-overflow for math. Or bound checking where possible.

      It will have a non-trivial cost, but hopefully aggressive optimizations can remove many of these checks (which ironically it is exactly the kind of optimizations people are complaining about) and compilers provide pragmas to disable them when critical.

      In a way sanitizers are getting there, but they are explicitly marked as for non-production use which is a problem.

      2 replies →

  • Except this can't happen happens many and many times in practice so maybe it's time the language bureaucrats got off their high horse (but they won't)

  • It's still braindead and idiotic. Every relevant platform nowadays has well defined overflow for signed ints. A sane C compiler should go with that and base its optimizations on it. GCC has been a pile of garbage in this regard for many years now. Its devs get further removed from reality with every year. Treating signed int overflow as undefined should be hidden behind a flag.

    • The C/C++ language doesn't provide for a way for the compiler to see that you really meant this one check to take precedence over the implicit promise in another.

      The reason why C++ is always relevant here (though C macros and inlining cause similar issues) is that generic programming being close to optimal is a language feature - and one of the ways that's possible is by letting you right reusable code that might be "called" from a context in which some of the checks or conditions just aren't necessary. It's by design that the optimizer gets to... well, optimize that kind of code.

      There's a solid case to be made that the details of C's UB weren't well chosen and we should try to update them; but which decades old choices are perfect? Which are easy to change once there's this much legacy software in operation?

      Don't forget that some of those UB's were chosen to deal with hardware realities of the day; i.e. that the "same" operation on different hardware would do different things. For example, eliminating signed integer overflow might allow a C compiler to use a signed register that's wider than necessary, which may help on hardware that doesn't have every possible register width, or where there are complex register usage limitations. I'm no hardware geek; I'm sure somebody here knows or real examples where UB allows portability, because that's the point: UB allows people to write portable, performant code - just don't do certain things, and you're fine... which leads us to today's situation, in which UB can feel like a minefield.

      10 replies →

    • Signed int overflow being UB is one of the most basic UBs of the language, and what allows generating tight code in loops.

      This is not new, -fwrapv was introduced in 2003, but it can quite severely impact code quality, if you don’t care, just set that. Then complain that C is slow, because C is a shit language.

      8 replies →

    • It's not about what your CPU does.

      These days undefined overflow for signed integers is mostly used by compilers to be able to assume that eg 'a + 1 > a' is always true, and thus eliminate redundant checks.

      (And you wouldn't typically write code like 'a + 1 > a', but you can get either from code generation via macros etc or as a intermediate result from previous optimization passes.)

  • Basically, the compiler implements integer addition using an operation that doesn't match the semantics of integer addition in the standard, then hallucinates that it did. That is:

    1) The compiler sees an expression like "a += b;" where a and b are signed integers.

    2) It emits "add rA rB" in x86 assembly (rA/B being the register a/b is currently in).

    3) Technically the machine code emitted does not match the semantics of the source code, since it uses wraparound addition, whereas the C standard says that for the operation to be valid, the values of a and b must be such that no overflow would occur. This is fine however, because the implementation has the freedom to do anything on integer overflow, including just punting the problem to hardware as it did in this case.

    4) The compiler proceeds with the rest of the code as if the line above would never overflow. My brother in the machine spirit, you chose to translate my program to a form where integer overflow is defined.

    The compiler should either a) trap on integer overflow; or b) accept integer overflow. It will be fine if it chooses either a) or b) situationally, i.e. if we have a loop where assuming no overflow is faster, then by all means - add a precondition check and crash the program if it's false, but don't just assume overflow doesn't happen when you explicitly emit code with well-defined overflow semantics.

    The bigger problem is there is pretty much no way to guard against this. The moment your program is longer than one page you're screwed. You may think all your functions are fine, but then you call something from some library, the compiler does some inlining and suddenly there's an integer overflow where you didn't expect, leading to your bounds check being deleted.

We need a -fsane-c

Then they can add a #pragma optimize(assumes=no-int-overflow, whatever, etc) to precisely add optimizations when needed and you 'know' its safe.

  • Everyone wants that, but when asked for a concrete specification they seem to realize that it is harder than it sounds. Look for John Regehr's blog entries about "Friendly C" for an example. The basic problem here is that C is a terrible language. We should just give up on it by now.

    • I provided the concrete specification. What is hard about it? Get it done already.

  • This already exists. Don't write standard C, avoid it like the plague. Compile with -fno-strict-overflow -fno-strict-aliasing -fno-delete-null-pointer-checks, like I do, like Linux kernel does, and like everyone sane does.

    • No, not everyone sane. Rather everyone sane who has been bitten enough by these issues to use such rules. Everyone starts out at -O2, because understanding all the other flags and their implications is super difficult. As long as the insane setting is default, a large percentage of programmers will be using the insane setting. Arguing that they should have flagged their compilations otherwise is about as useful as pointing out that people shouldn't write UB in the first place.

      1 reply →

  • You can get 99% of the way there with -fno-delete-null-pointer-checks -fno-strict-aliasing -fwrapv . Pretty much every program I've worked on uses those flags, as that's the only way to keep your sanity.

  • -fdwim

    Next generation of AI powered compilers will try to interpret code at a more abstract level and infer what the programmer was thinking even if they wrote the wrong thing.

    Everything will work perfectly 100% of the time.

The hard things about C are knowing all these footguns you are getting yourself into. If our electrical grid was built like this we had no isolation, no fuses, no RCDs and a constant torrent of electrocutions and fires. It is bad engineering.

  • ... and people defending the status quo because "you just have to know how electricity works" :-)

> On the other hand, there seem to be so many cases like this one where the “undefined behavior code deleter goes brrrrrr” really overextends its usefulness.

I would simply not depend on invoking UB as part of my program's behavior (?).

  • That's the only way to use C correctly.

    Alas, it's nearly impossible for a mere mortal to write any non-trivial C code that doesn't have UB.

    • "a+b" for signed integer types potentially invokes undefined behavior (overflow). It's a horrendous situation.

  • Alternatively I would not trust myself with not doing that, which is why I don’t use C.

  • I don't think I ever claimed that one would or should intentionally invoke UB as part of their program's behavior?