← Back to context

Comment by stevenhuang

1 hour ago

Edit: I think one part of the confusion is we were addressing different parts of the first example of the article. You were referencing the int foo(..) snippet (which I agree has no UB), but I was referencing the parse_packet() snippet (which has UB by construction), which was also part of the first example :).

You are beginning to understand. Yes, surprisingly, it is (1) that is being claimed.

The mere expression is alone UB. Yes, you read that right. In source code, it's already UB. Why? Because the ISO spec defined UB that way. But you see, what this means in practice ie whether "it works" is an entirely separate question and would be specific to toolchain, hardware, runtime, the alignment of the pointer in question, blah blah.

There is nuance here, and that's why this topic is debated to death, because it's hard to explain and it is genuinely complex.

When people say something is UB, they mean to say that the behaviour is undefined--wrt to ISO C.

The behaviour that actually matters IS defined wrt toolchain, hardware, runtime, alignment of pointer in question.

But that's exactly it--the latter is not what we mean when we say something is UB, when we say something is UB we are talking about the ISO C spec. The important follow up question then, when knowingly invoking UB, is to ensure your environment is "correct", because you have now crossed into realms entirely out of the auspices of the ISO C spec. Ergo, you are now in UB land; what you thought was the foundation of your codebase, the ISO C spec, has now turned into quicksand.

It is this implied undocumented dependence on factors external to the source code that is a huge source of bugs and surprisal.

So take this example from the article. Yes, it is UB by construction.

(edit: i copied the wrong fragment initially -- if you were talking about the int foo(const int* p) fragment, yes that block is not by construction UB)

    bool parse_packet(const uint8_t\* bytes) {
            const int\* magic_intp = (const int*)bytes;   // UB!
            int magic_raw = foo(magic_intp);  // Probably crashes on SPARC.
            int magic = ntohl(magic_raw); // this is fine, at least.
            […]
    }

Why?

> Because the compiler is not obligated to generate assembly instructions that work on unaligned pointers. Because it’s UB.

Does it actually work though? It might and it might not: there is simply no guarantee from the language. But that's all it says. It may very well work on your arch and platform and toolchain, indefinitely. But again circling back, for code written like this to be so brittle, that is why UB is to be avoided.

And to your point:

> that's where the debate should be, not illusorily ascribed to derefing pointers.

But that is where the debate is. People just do not understand what UB actually means. The article is correct: everything in C is UB. The takeaway is not that, therefore all C code is irredeemably broken (well, to some people it does mean that, anywho..). The takeaway is that most C code IS in fact more delicate than one may originally believe, because of the fact ISO C is under-specified, to allow for specialization dependent on toolchain/arch/hardware/what have you etc.

So it is incumbent on the developer when writing C to correctly acknowledge when they are invoking UB, and to do so intentionally with the awareness that things may just randomly break one day.