← Back to context

Comment by kazinator

6 hours ago

Undefined behavior only means that ISO C doesn't give requirements, not that nobody gives requirements. Many useful extensions are instances where undefined behavior is documented by an implementation.

Including a header that is not in the program, and not in ISO C, is undefined behavior. So is calling a function that is not in ISO C and not in the program. (If the function is not anywhere, the program won't link. But if it is somewhere, then ISO C has nothing to say about its behavior.)

Correct, portable POSIX C programs have undefined behavior in ISO C; only if we interpret them via IEEE 1003 are they defined by that document.

If you invent a new platform with a C compiler, you can have it such that #include <windows.h> reformats all the attached storage devices. ISO C allows this because it doesn't specify what happens if #include <windows.h> successfully resolves to a file and includes its contents. Those contents could be anything, including some compile-time instruction to do harm.

Even if a compiler's documentationd doesn't grant that a certain instance of undefined behavior is a documented extension, the existence of a de facto extension can be inferred empirically through numerous experiments: compiling test code and reverse engineering the object code.

Moreover, the source code for a compiler may be available; the behavior of something can be inferred from studying the code. The code could change in the next version. But so could the documentation; documentation can take away a documented extension the same way as a compiler code change can take away a de facto extension.

Speaking of object code: if you follow a programming paradigm of verifying the object code, then undefined behavior becomes moot, to an extent. You don't trust the compiler anyway. If the machine code has the behavior which implements the requirements that your project expects of the source code, then the necessary thing has been somehow obtained.

> Undefined behavior only means that ISO C doesn't give requirements, not that nobody gives requirements. Many useful extensions are instances where undefined behavior is documented by an implementation.

True, most compilers have sane defaults in many cases for things that are technically undefined (like take sizeof(void) or do pointer arithmetic on something other than a char). But not all of these cases can be saved by sane defaults.

Undefined behavior means the compiler can replace the code with whatever. So if you e.g. compile optimizing for size, the compiler will rip out the offending code, as replacing it with nothing yields the greatest size optimization.

See also John Regehr's collection of UB-Canaries: https://github.com/regehr/ub-canaries

Snippets of software exhibiting undefined behavior, executing e.g. both the true and the false branch of an if-statement or none etc. UB should not be taken lightly IMO...

  • > [...] undefined behavior, executing e.g. both the true and the false branch of an if-statement or none etc.

    Or replacing all you mp3s with a Rick Roll. Technically legal.

    (Some old version of GHC had a hilarious bug where it would delete any source code with a compiler error in it. Something like this would technically legal for most compiler errors a C compiler could spot.)

Unfortunely it also means that when the programmer fails to understand what undefined behaviour is exposed on their code, the compiler is free to take advantage of that to do the ultimate performance optimizations as means to beat compiler benchmarks.

The code change might come in something as innocent as a bug fix to the compiler.

  • Ah yes, the good old "compiler writers only care about benchmarks and are out to hurt everyone else" nonsense.

    I for one am glad that compilers can assume that things that can't happen according to the language do in fact not happen and don't bloat my programs with code to handle them.

    • Moral hazard here. The rest of us, and all of society, now rests on a huge pile of code written by incorrigible misers who imagined themselves able to write perfect, bug-free code that would go infinitely fast because bad things never happen. But see, there's bugs in your code and other people pay the cost.

    • > I for one am glad that compilers can assume that things that can't happen according to the language do in fact not happen and don't bloat my programs with code to handle them.

      Yes, unthinkable happenstances like addition on fixed-width integers overflowing! According to the language, signed integers can't overflow, so code like the following:

          int new_offset = current_offset + 16;
          if (new_offset < current_offset)
              return -1; // Addition overflowed, something's wrong
      

      can be optimized to the much leaner

          int new_offset = current_offset + 16;
      

      Well, I sure am glad the compiler helpfully reduced the bloat in my program!

> Including a header that is not in the program, and not in ISO C, is undefined behavior.

What is this supposed to mean? I can't think of any interpretation that makes sense.

I think ISO C defines the executable program to be something like the compiled translation units linked together. But header files do not have to have any particular correspondence to translation units. For example, a header might declare functions whose definitions are spread across multiple translation units, or define things that don't need any definitions in particular translation units (e.g. enum or struct definitions). It could even play macro tricks which means it declares or defines different things each time you include it.

Maybe you mean it's undefined behaviour to include a header file that declares functions that are not defined in any translation unit. I'm not sure even that is true, so long as you don't use those functions. It's definitely not true in C++, where it's only a problem (not sure if it's undefined exactly) if you ODR-rule use a function that has been declared but not defined anywhere. (Examples of ODR-rule use are calling or taking the address of the function, but not, for example, using sizeof on an expression that includes it.)

  • > I can't think of any interpretation that makes sense

    Start with a concrete example. A header that is not in our program, or described in ISO C. How about:

      #include <winkle.h>
    

    Defined behavior or not? How can an implementation respond to this #include while remaining conforming? What are the limits on that response?

    > But header files do not have to have any particular correspondence to translation units.

    A header inclusion is just a mechanism that brings preprocessor tokens into a translation unit. So, what does the standard tell us about the tokens coming from #include <winkle.h> into whatever translation unit we put it into?

    Say we have a single file program and we made that the first line. Without that include, it's a standard-conforming Hello World.

    • Do you just meant an attempt to include a file path that couldn't be found? That's not a correct usage of the term "program" – that refers to the binary output of the compilation process, whereas you're taking about the source files that are the input to the compilation. That sounds a bit pedantic but I really didn't understand what you meant.

      I just checked, and if you attempt to include a file that cannot be found (in the include path, though it doesn't use that exact term) then that's a constraint violation and the compiler is required to stop compilation and issue a diagnostic. Not undefined behaviour.

    • I think we are slowly getting closer to the crux of the matter. Are you saying that it's a problem to include files from a library since they are "not in our program"? What does that phrase actually mean? What is the bounds of "our program" anyway? Couldn't it be the set {main.c, winkle.h}

You are basically trying to explain the difference between a conforming program and a strictly conforming one.