← Back to context

Comment by matheusmoreira

8 hours ago

What stage is the "just make the compiler define the undefined" stage?

Unaligned access? Packed structs. Compiler will magically generate the correct code, as if it had always known how to do it right all along! Because it has, in fact, always known how to do it right. It just didn't.

Strict aliasing? Union type punning. Literally documented to work in any compiler that matters, despite the holy C standard never saying so. Alternatively, just disable it straight up: -fno-strict-aliasing. Enjoy reinterpreting memory as you see fit. You might hit some sharp edges here and there but they sure as hell aren't gonna be coming from the compiler.

Overflow? Just make it defined: -fwrapv. Replace +, -, * with __builtin_*_overflow while you're at it, and you even get explicit error checking for free. Nice functional interface. Generates efficient code too.

The "acceptance" stage is really "nobody sane actually cares about the C standard". The standard is garbage, only the compilers matter. And it turns out that compilers have plenty of extremely useful functions that let you side step most if not all of this. People just don't use this because they want to write "portable" "standard" C. The real acceptance is to break out of that mindset.

Somehow I built an entire lisp interpreter in freestanding C that actually managed to pass UBSan just by following the above logic. I was actually surprised at first: I expected it to crash and burn, but it didn't. So if I can do it, then anyone can do it too.

A lot of the Central UB can not be defined, because they rely on detection. In order to have a well defined behaviour (by the standard or the compiler) the implementation needs to first detect that the behaviour is triggered, this is often very tricky or expensive. Its easy to define that a program should halt, if it writes outside an array, but detecting if it does can be both slow and hard to implement. There are implementations that do, but they are rarely used outside of debugging.

A better way to think about UB is as a contract between developer and implementation, so that the implementations can more easily reason about the code. How would you optimize:

(x * 2) / 2

An optimizer can optimize this out for a signed integer, because it doesn't have to consider overflow, but with a unsigned integer it can not. UB is a big reason why C is the most power efficient high level language.

  • > How would you optimize: (x * 2) / 2

    I'd do the math myself and just write x.

    I don't even use * for multiplication anymore, I use __builtin_mul_overflow and then check the result. Anyone who doesn't is gonna hit the overflow case one day, and they'll be lucky if their program isn't exploited because of it. I've been making an effort to use all the overflow checking builtins by default in most if not all cases. I've also been making Claude audit every single bare arithmetic operation in my projects. He's caught quite a few security issues already, and overflow checking dealt with them all.

    This particular contract between developer and implementation is totally worthless and doing more harm than good. It encompasses regular everyday normal things like multiplication and addition. All things that our brains literally rely on in order to reason about the code. Can't even add numbers without the compiler screwing it up.

    Programmers need to deal with overflow at all times. Can't calculate an offset without dealing with overflow. Can't calculate a size without dealing with overflow. It's simply everywhere in systems programming, which is what C was designed to do. The consequence of ignoring this is usually that your program gets mercilessly exploited.

    All this for some efficiency gains. The cost/benefit analysis is way off here. Things should be correct, first and foremost. Then the compiler should give us the necessary sharp tools to make it fast, if needed. It shouldn't be making it fast at the cost of turning the entire language into a memetic vulnerability machine.

> Strict aliasing? Union type punning. Literally documented to work in any compiler that matters, despite the holy C standard never saying so.

It does say so, actually, since C99 TC3 (DR 283).

> Unaligned access? Packed structs.

Packed structs are dangerous. You can do unaligned accesses through a packed type, but once you take the address of your misaligned int field, then you are back into UB territory. Very annoying in C++ when you try to pass the a misaligned field through what happens to be generic code that takes a const reference, as it will trigger a compiler warning. Unary operator+ is your friend.

  • > but once you take the address of your misaligned int field

    Gotta work with the structure directly by taking the address of the packed structure itself.

      struct uu64 {
          u64 value;
      } __attribute__((packed));
    
      struct uu64 unaligned;
      struct uu64 *address = &unaligned;
    
      address->value; // this works
    
      u64 *broken = &address->value; // this doesn't
    

    Taking the address of the field inside the structure essentially casts away the alignment information that was explicitly added to stop the compiler from screwing things up. So it should not be done.

    Mercifully, both gcc and clang emit address-of-packed-member warnings if it's done. So the packed structures are effectively turning silently broken nonsense code into sensible warnings. Major win.

> What stage is the "just make the compiler define the undefined" stage?

It can be left as implementation defined, which means that the compiler can't simply do arbitrary things, it needs to document what it would do.

Take, for example, signed-integer overflow: currently a compiler can simply refuse to emit the code in one spot while emitting it in another spot in the same compilation unit! Making it IB means that the compiler vendor will be forced to define what happens when a signed-integer overflows, rather than just saying, as they do now, "you cannot do that, and if you do we can ignore it, correct it, replace it or simply travel back in time and corrupt your program".

> Somehow I built an entire lisp interpreter in freestanding C that actually managed to pass UBSan just by following the above logic. I was actually surprised at first: I expected it to crash and burn, but it didn't. So if I can do it, then anyone can do it too.

Same here; I built a few non-trivial things that passed the first attempt at tooling (valgrind, UBsan with tests, fuzzing, etc) with no UB issues found.

  • Completely agree. It can, and I think it's extremely annoying that it wasn't.

    So we have the next best thing: builtins and flags. So long as those cover all the undefined behavior there is, we can live with it. Compiler gets to be "conformant" and we get to do useful things without the compiler folding the code into itself and inside out.

> People just don't use this because they want to write "portable" "standard" C

Something that bothers me is the Venn diagram of people that think abstraction is slow and error prone and people that only write portable C.

How many C implementations do you actually need to compile against? I don't think I've seen more than 3 outside Unix software from the 90s. Using non portable extensions is in fact totally doable for your application and you should probably do it, and just duplicate/triplicate code where you have to. It's not that hard to write and not hard to read.