← Back to context

Comment by stevenhuang

10 hours ago

The examples are unequivocally UB. Full stop.

How to think of this properly is that when you have UB, you are no longer under the auspices of a language standard. Things may work fine for a time, indefinitely even. But what happens instead is you unknowingly become subject to whimsies of your toolchain (swap/upgrade compilers), architecture, or runtime (libc version differences).

You end up building a foundation on quicksand. That's the danger of UB.

> The examples are unequivocally UB. Full stop.

Tbh, already the first example (unaligned pointer access) is bogus and the C standard should be fixed (in the end the list of UB in the C standard is entirely "made up" and should be adapted to modern hardware, a lot of UB was important 30 years ago to allow optimizations on ancient CPUs, but a lot of those hardware restrictions are long gone).

In the end it's the CPU and not the compiler which decides whether an unaligned access is a problem or not. On most modern CPUs unaligned load/stores are no problem at all (not even a performance penalty unless you straddle a cache line). There's no point in restricting the entire C standard because of the behaviour of a few esoteric CPUs that are stuck in the past.

PS: we also need to stop with the "what if there is a CPU that..." discussions. The C standard should follow the current hardware, and not care about 40 year old CPUs or theoretical future CPU architectures. If esoteric CPUs need to be supported, compilers can do that with non-standard extensions.

  • Not having unaligned access in the language allows the compiler to assume that, for basic types where the aligment is at least the size, if two addresses are different then they don't alias and writes to one can't change the result of reads from the other. That's a very useful assumption to be able to make for optimization - much more useful than yolocasting pointers in a way that could get you unaligned ones.

    • > if two addresses are different ...

      Eh, if the compiler knows that two addresses are different at compile time, it also knows how big the difference is.

      2 replies →

  • Undefined means that the ISO C doesn't define the behavior. An implementation is free to do so.

    • If they do, that is no longer an implementation of C. It is a dialect of C, and there are many (GNU C being the most popular), but there are real drawbacks to using dialects.

      This is in contrast to the other category that exists, which is "implementation-defined".

      7 replies →

  • I agree. I meant to elaborate more on how to think of UB.

    For most C software on x86_64, UB is "fine" with very strong bunny ears. But it is preferable for one to, shall we say, write UB intentionally rather than accidentally and unknowingly. Having an awareness of all the minefields lends for more respect for the dangers of C code, it makes one question literally everything, and that would hopefully result in more correct code, more often.

    On that note, on some RISC-V cores unaligned access can turn a single load into hundreds of instructions.

    I think the problem is just that C is under specified for what we expect a language to provide in the modern age. It is still a great language, but the edges are sharp.

  • There are still modern CPUs that don't support misaligned access. It would be insane for C to mandate that misaligned accesses are supported.

    However I do agree that just saying "the behaviour is undefined" is an unhelpful cop-out. They could easily say something like "non-atomic misaligned accesses either succeed or trap" or something like that.

    > In the end it's the CPU and not the compiler which decides whether an unaligned access is a problem or not.

    Not just the CPU - memory decides as well. MMIO devices often don't support misaligned accesses.

    • > They could easily say something like "non-atomic misaligned accesses either succeed or trap" or something like that.

      That means that the compiler must emit the read, even if the value is already known or never used, as it might trap. There is a reason for the UB!

      4 replies →

    • On hardware that doesn't support it, misaligned loads could be compiled to multiple loads and shifts. Probably not great for performance, and it doesn't work if you need it to be atomic, but it isn't impossible.

      2 replies →

The first example is dereferencing an integer pointer. That is a valid operation. Now if that pointer isn't valid (and being unaligned is one of many reasons it could be invalid) then calling the function with that invalid pointer will be UB.

An honest discussion would be something more like 'dereferencing pointers can lead to UB on invalid pointers. Here are N examples of that. Maybe avoid using pointers. Maybe consider how other languages avoid pointers. Maybe these shouldn't be UB and instead some other class of error.' And then even more honest discussion would present the upsides of having pointers and the upsides of having these errors be UB.

Instead, the article (and your comment) take this valid operation and presents it as invalid. Imagine you're a new programmer, you are just starting to wrap your head around pointers and you stumble across this article. You see the first example and it looks exactly what you would expect a dereference to look like. But the article claims it's wrong, and now you're confused. So you dig into the article more closely and are exposed to all these terms like UB, alignment, type coercion etc and come away more confused and scared and disinclined to understand pointers. This is classic FUD. This is a technique to manipulate, not educate.

Pointers have pros and cons. UB has pros and cons. Let's try to educate people about them.

  • There is an important distinction here to the technical meaning of UB that is lost to many.

    UB simply means the operation you are intending to perform has no defined semantic under the ISO C specification. That is all. Understand what this means but do not read further into it. It is easy to read further into this as you have and many do, and come to incorrect conclusions, and think this MUST result in incorrect behaviour, but this is not the claim. The claim is rather than once you write UB, you are no longer writing C the language with a defined spec, and that any manner of degrees of freedom (architecture, toolchain, etc) can now cause your code that was once behaving correctly to now behave incorrectly. That is the danger.

    > That is a valid operation. Now if that pointer isn't valid (and being unaligned is one of many reasons it could be invalid) then calling the function with that invalid pointer will be UB.

    This is incorrect. The moment you express this in source code, it is already UB wrt to the C abstract machine.

    6.3.2.3. 755 If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

    https://c0x.shape-of-code.com/6.3.2.3.html

    The important distinction is to KNOW this is still UB; whether the operation yields the expected behaviour on your platform and architecture is completely a separate question.

    The reason this is of utmost important is because the C compiler operates on the C abstract machine.

    If you violate language invariants, the compiler can--keyword can--emit WRONG code and it will be CORRECT to do so because C unfortunately allows it to. When this happens it's silent and deadly and it's a pain to debug. The point of all this seeming language lawyering is not FUD, it is genuine frustration with these footguns of the language that we are trying to share with others. Understanding UB correctly really is what separates those that know C and those that "know" C.

    Things will work and then they won't. This can be fine for most cases but not fine for others. If you use C in 2026 you need to understand this.

    > come away more confused and scared

    This is the correct take. One aught to be more confused and scared after learning about UB; the language simply leaves things under-specified and it is up to the developer to understand they are engaging in UB.

    Once UB is acknowledged, one aught to impress upon themselves the software they build is dependent ever more on the whims of their particular compiler (clang/gcc), compiler flags (optimizations), architecture, and runtime environment.