← Back to context

Comment by greysphere

10 hours ago

The examples aren't really undefined behavior. They are examples that could become UB based on input/circumstances. Which if you are going to be that generous, every function call is UB because it could exceed stack space. Which is basically true in any language (up to the equivalent def of UB in that language). I feel like c has enough actual rough edges that deserve attention that sensationalism like this muddies folks attention (particularly novices) and can end up doing more harm than good.

Ada 83 has no UB on call stack overflow, from the reference manual :

http://archive.adaic.com/standards/83lrm/html/lrm-11-01.html

"STORAGE_ERROR This exception is raised in any of the following situations: (...) or during the execution of a subprogram call, if storage is not sufficient."

  • So it's just as useful as when your stack area ends with a page that will segfault on access, or your CPU will raise an interrupt if stack pointer goes beyond a particular address?

    It's not safe though because throwing an exception, panicking, etc, is still a denial of service. It's just more deterministic than silently overwriting the heap instead. If the program is critical then you need to be able to statically prove the full size of the stack, which you can do with C and C++ with the right tools and restrictions.

    • You're mixing specification (a language reference manual) and implementation (a given compiler, target, options, ...).

      The Ada language specification says the Ada programmer can expect any Ada compiler when used in fully compliant mode to properly raise STORAGE_ERROR when a stack overflow occurs.

      Only the Ada compiler writer has to deal with this, not every single programmer on every single program and platform (the UB behaviour of some languages).

      In the case of GCC/GNAT the compiler manual provides insight on how to be in compliant mode per target regarding stack overflow, what are the limitations if any. You have tools to monitor and analyze you Ada code in this respect too.

    • Deterministic, well-defined behavior is inherently safer than undefined behavior. It allows you to diagnose the problem and fix it. UB emphatically does not, and I don't dare to think of how many millions of person-hours are wasted every year dealing with the results.

    • A segfault is considered safe if you're talking about functional safety because it results in a return to a defined safe state (RTDSS).

      If a segfault leads to some other state you do not deem "safe", such as a single program gating access to a valuable asset with a default fail state of "allow", you just have a fundamental design flaw in your system. The safety problem is you or your AI agent, not the segfault.

That's not true at all.

First, you can define what happens when stack space is exceeded. Second not all programs need an arbitrary amount of stack space, some only need a constant amount that can be calculated ahead of time. (And some languages don't use a stack at all in their implementations.)

Your language could also offer tools to probe how much stack space you have left, and make guarantees based on that. Or they could let you install some handlers for what to do when you run out of stack space.

The examples are unequivocally UB. Full stop.

How to think of this properly is that when you have UB, you are no longer under the auspices of a language standard. Things may work fine for a time, indefinitely even. But what happens instead is you unknowingly become subject to whimsies of your toolchain (swap/upgrade compilers), architecture, or runtime (libc version differences).

You end up building a foundation on quicksand. That's the danger of UB.

  • > The examples are unequivocally UB. Full stop.

    Tbh, already the first example (unaligned pointer access) is bogus and the C standard should be fixed (in the end the list of UB in the C standard is entirely "made up" and should be adapted to modern hardware, a lot of UB was important 30 years ago to allow optimizations on ancient CPUs, but a lot of those hardware restrictions are long gone).

    In the end it's the CPU and not the compiler which decides whether an unaligned access is a problem or not. On most modern CPUs unaligned load/stores are no problem at all (not even a performance penalty unless you straddle a cache line). There's no point in restricting the entire C standard because of the behaviour of a few esoteric CPUs that are stuck in the past.

    PS: we also need to stop with the "what if there is a CPU that..." discussions. The C standard should follow the current hardware, and not care about 40 year old CPUs or theoretical future CPU architectures. If esoteric CPUs need to be supported, compilers can do that with non-standard extensions.

    • Not having unaligned access in the language allows the compiler to assume that, for basic types where the aligment is at least the size, if two addresses are different then they don't alias and writes to one can't change the result of reads from the other. That's a very useful assumption to be able to make for optimization - much more useful than yolocasting pointers in a way that could get you unaligned ones.

      3 replies →

    • I agree. I meant to elaborate more on how to think of UB.

      For most C software on x86_64, UB is "fine" with very strong bunny ears. But it is preferable for one to, shall we say, write UB intentionally rather than accidentally and unknowingly. Having an awareness of all the minefields lends for more respect for the dangers of C code, it makes one question literally everything, and that would hopefully result in more correct code, more often.

      On that note, on some RISC-V cores unaligned access can turn a single load into hundreds of instructions.

      I think the problem is just that C is under specified for what we expect a language to provide in the modern age. It is still a great language, but the edges are sharp.

    • There are still modern CPUs that don't support misaligned access. It would be insane for C to mandate that misaligned accesses are supported.

      However I do agree that just saying "the behaviour is undefined" is an unhelpful cop-out. They could easily say something like "non-atomic misaligned accesses either succeed or trap" or something like that.

      > In the end it's the CPU and not the compiler which decides whether an unaligned access is a problem or not.

      Not just the CPU - memory decides as well. MMIO devices often don't support misaligned accesses.

      9 replies →

  • The first example is dereferencing an integer pointer. That is a valid operation. Now if that pointer isn't valid (and being unaligned is one of many reasons it could be invalid) then calling the function with that invalid pointer will be UB.

    An honest discussion would be something more like 'dereferencing pointers can lead to UB on invalid pointers. Here are N examples of that. Maybe avoid using pointers. Maybe consider how other languages avoid pointers. Maybe these shouldn't be UB and instead some other class of error.' And then even more honest discussion would present the upsides of having pointers and the upsides of having these errors be UB.

    Instead, the article (and your comment) take this valid operation and presents it as invalid. Imagine you're a new programmer, you are just starting to wrap your head around pointers and you stumble across this article. You see the first example and it looks exactly what you would expect a dereference to look like. But the article claims it's wrong, and now you're confused. So you dig into the article more closely and are exposed to all these terms like UB, alignment, type coercion etc and come away more confused and scared and disinclined to understand pointers. This is classic FUD. This is a technique to manipulate, not educate.

    Pointers have pros and cons. UB has pros and cons. Let's try to educate people about them.

    • There is an important distinction here to the technical meaning of UB that is lost to many.

      UB simply means the operation you are intending to perform has no defined semantic under the ISO C specification. That is all. Understand what this means but do not read further into it. It is easy to read further into this as you have and many do, and come to incorrect conclusions, and think this MUST result in incorrect behaviour, but this is not the claim. The claim is rather than once you write UB, you are no longer writing C the language with a defined spec, and that any manner of degrees of freedom (architecture, toolchain, etc) can now cause your code that was once behaving correctly to now behave incorrectly. That is the danger.

      > That is a valid operation. Now if that pointer isn't valid (and being unaligned is one of many reasons it could be invalid) then calling the function with that invalid pointer will be UB.

      This is incorrect. The moment you express this in source code, it is already UB wrt to the C abstract machine.

      6.3.2.3. 755 If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

      https://c0x.shape-of-code.com/6.3.2.3.html

      The important distinction is to KNOW this is still UB; whether the operation yields the expected behaviour on your platform and architecture is completely a separate question.

      The reason this is of utmost important is because the C compiler operates on the C abstract machine.

      If you violate language invariants, the compiler can--keyword can--emit WRONG code and it will be CORRECT to do so because C unfortunately allows it to. When this happens it's silent and deadly and it's a pain to debug. The point of all this seeming language lawyering is not FUD, it is genuine frustration with these footguns of the language that we are trying to share with others. Understanding UB correctly really is what separates those that know C and those that "know" C.

      Things will work and then they won't. This can be fine for most cases but not fine for others. If you use C in 2026 you need to understand this.

      > come away more confused and scared

      This is the correct take. One aught to be more confused and scared after learning about UB; the language simply leaves things under-specified and it is up to the developer to understand they are engaging in UB.

      Once UB is acknowledged, one aught to impress upon themselves the software they build is dependent ever more on the whims of their particular compiler (clang/gcc), compiler flags (optimizations), architecture, and runtime environment.

      1 reply →