Comment by stevenhuang

2 hours ago

There is an important distinction here to the technical meaning of UB that is lost to many.

UB simply means the operation you are intending to perform has no defined semantic under the ISO C specification. That is all. Understand what this means but do not read further into it. It is easy to read further into this as you have and many do, and come to incorrect conclusions, and think this MUST result in incorrect behaviour, but this is not the claim. The claim is rather than once you write UB, you are no longer writing C the language with a defined spec, and that any manner of degrees of freedom (architecture, toolchain, etc) can now cause your code that was once behaving correctly to now behave incorrectly. That is the danger.

> That is a valid operation. Now if that pointer isn't valid (and being unaligned is one of many reasons it could be invalid) then calling the function with that invalid pointer will be UB.

This is incorrect. The moment you express this in source code, it is already UB wrt to the C abstract machine.

6.3.2.3. 755 If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

https://c0x.shape-of-code.com/6.3.2.3.html

The important distinction is to KNOW this is still UB; whether the operation yields the expected behaviour on your platform and architecture is completely a separate question.

The reason this is of utmost important is because the C compiler operates on the C abstract machine.

If you violate language invariants, the compiler can--keyword can--emit WRONG code and it will be CORRECT to do so because C unfortunately allows it to. When this happens it's silent and deadly and it's a pain to debug. The point of all this seeming language lawyering is not FUD, it is genuine frustration with these footguns of the language that we are trying to share with others. Understanding UB correctly really is what separates those that know C and those that "know" C.

Things will work and then they won't. This can be fine for most cases but not fine for others. If you use C in 2026 you need to understand this.

> come away more confused and scared

This is the correct take. One aught to be more confused and scared after learning about UB; the language simply leaves things under-specified and it is up to the developer to understand they are engaging in UB.

Once UB is acknowledged, one aught to impress upon themselves the software they build is dependent ever more on the whims of their particular compiler (clang/gcc), compiler flags (optimizations), architecture, and runtime environment.

Maybe I'm misunderstanding. Here is what I'm trying to say.

"Accessing an object which is not correctly aligned" - this is UB

"As an example of this, take this code: ..." - this (code) is not UB.

Is this incorrect somehow?

You could interpret the second sentence as 'under the assumption of an unaligned pointer, let's look at what this seemingly innocuous (and correct) code does.'

But that's not what they did. They presented that code as if it's incorrect (following the whole premise of the article 'Everything in c is UB'). That's what the whole article does, they take a topic with real concerns, then present 'normal' code, and then imply the code is the issue (and therefore the language), not the premise.

You know what would be better, show an example that clearly shows the complete path for the premise to the issue. Ie show some code that generates an unaligned pointer and then uses it. Why did the author not do that? Surprise, because it's actually pretty hard to write code that's 'guaranteed' unaligned behavior.

    int foo[10];
    int *bar = (int *)(((int)&foo) + 1);

Is this unaligned access? You don't know because you don't know the size of int.

And in fact that's the whole point - things surrounding alignment and sizes are given the 'privilege' of being undefined in c so that compilers are easier to write. It's very debatable if this was/is a good idea, but that's where the debate should be, not illusorily ascribed to derefing pointers.

If I'm misunderstanding, please let me know. Specifically, if you're claiming (1) either the literal code in the first box of the article is UB, or (2) please write some literal code that is UB in the vein of the first claim of the article. I think that would help me bridge the gap that we seem to be having.