Comment by greysphere

2 hours ago

Maybe I'm misunderstanding. Here is what I'm trying to say.

"Accessing an object which is not correctly aligned" - this is UB

"As an example of this, take this code: ..." - this (code) is not UB.

Is this incorrect somehow?

You could interpret the second sentence as 'under the assumption of an unaligned pointer, let's look at what this seemingly innocuous (and correct) code does.'

But that's not what they did. They presented that code as if it's incorrect (following the whole premise of the article 'Everything in c is UB'). That's what the whole article does, they take a topic with real concerns, then present 'normal' code, and then imply the code is the issue (and therefore the language), not the premise.

You know what would be better, show an example that clearly shows the complete path for the premise to the issue. Ie show some code that generates an unaligned pointer and then uses it. Why did the author not do that? Surprise, because it's actually pretty hard to write code that's 'guaranteed' unaligned behavior.

    int foo[10];
    int *bar = (int *)(((int)&foo) + 1);

Is this unaligned access? You don't know because you don't know the size of int. (Not to mention it looks ridiculous. By only showing 'reasonable' code as the example, the article suppresses the common 'uh just don't do that' criticism.)

And in fact the ambiguity of alignments and sizes is the whole point - they are given the privilege/footgun of being undefined in c so that compilers are easier to write. It's very debatable if this was/is a good idea, but that's where the debate should be, not illusorily ascribed to derefing pointers.

If I'm misunderstanding, please let me know. Specifically, if you're claiming (1) either the literal code in the first box of the article is UB, or (2) please write some literal code that is UB in the vein of the first claim of the article. I think that would help me bridge the gap that we seem to be having.

1 comment

greysphere

stevenhuang 15 minutes ago

You are beginning to understand. Yes, surprisingly, it is (1) that is being claimed.

The mere expression is alone UB. Yes, you read that right. In source code, it's already UB. Why? Because the ISO spec defined UB that way. But you see, what this means in practice ie whether "it works" is an entirely separate question and would be specific to toolchain, hardware, runtime, the alignment of the pointer in question, blah blah.

There is nuance here, and that's why this topic is debated to death, because it's hard to explain and it is genuinely complex.

When people say something is UB, they mean to say that the behaviour is undefined--wrt to ISO C.

The behaviour that actually matters IS defined wrt toolchain, hardware, runtime, alignment of pointer in question.

But that's exactly it--the latter is not what we mean when we say something is UB, when we say something is UB we are talking about the ISO C spec. The important follow up question then, when knowingly invoking UB, is to ensure your environment is "correct", because you have now crossed into realms entirely out of the auspices of the ISO C spec. Ergo, you are now in UB land; what you thought was the foundation of your codebase, the ISO C spec, has now turned into quicksand.

It is this implied undocumented dependence on factors external to the source code that is a huge source of bugs and surprisal.

So take this example from the article. Yes, it is UB by construction.

    int foo(const int* p) {
       return *p;
    }

Why?

> Because the compiler is not obligated to generate assembly instructions that work on unaligned pointers. Because it’s UB.

Does it actually work though? It might and it might not: there is simply no guarantee from the language. But that's all it says. It may very well work on your arch and platform and toolchain, indefinitely. But again circling back, for code written like this to be so brittle, that is why UB is to be avoided.

And to your point:

> that's where the debate should be, not illusorily ascribed to derefing pointers.

But that is where the debate is. People just do not understand what UB actually means. The article is correct: everything in C is UB. The takeaway is not that, therefore all C code is irredeemably broken (well, to some people it does mean that, anywho..). The takeaway is that most C code IS in fact more delicate than one may originally believe, because of the fact ISO C is under-specified, to allow for specialization dependent on toolchain/arch/hardware/what have you etc.

So it is incumbent on the developer when writing C to correctly acknowledge when they are invoking UB, and to do so intentionally with the awareness that things may just randomly break one day.