← Back to context

Comment by veltas

11 hours ago

From the ANSI C standard:

  3.16 undefined behavior: Behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately valued objects, for which this International Standard imposes no requirements.  Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message).

Is it just me or did compiler writers apply overly legalistic interpretation to the "no requirements" part in this paragraph? The intent here is extremely clear, that undefined behavior means you're doing something not intended or specified by the language, but that the consequence of this should be somewhat bounded or as expected for the target machine. This is closer to our old school understanding of UB.

By 'bounded', this obviously ignores the security consequences of e.g. buffer overflows, but just because UB can be exploited doesn't mean it's appropriate for e.g. the compiler to exploit it too, that clearly violates the intent of this paragraph.

> but that the consequence of this should be somewhat bounded or as expected for the target machine.

Aren't "unpredictable results" and "no requirements" contrary to the idea that the behavior would be "somewhat bounded"?

  • Notice though "ignoring the situation" thru "documented manner characteristic of the environment". Even though truly you can read this in an uncharitable way, you could also try and understand the intent of this paragraph, and I think reading it for its intents is always the best way to interpret a language standard when the wording is ambiguous or soft, especially if you're writing a compiler.

    I don't think you could sincerely argue that this definition intends to allow the compiler to totally rewrite your code because of one guaranteed UB detected on line 5, just that it would be good to print a diagnostic if it can be detected, and if not to do what's "characteristic of the environment". Does that make sense?

    • Ex falso quodlibet.

      Bounding UB would be a nice idea, or at least prohibiting time-traveling UB (and there is an effort in that direction). But properly specifing it is actually hard.

      2 replies →

    • Reading for intent is pragmatic.

      Reading adversarially is what people do who are looking for ways that something can be abused, from an offensive or defensive position.

      Personally I am tired of the entire topic.

      3 replies →

    • > Notice though "ignoring the situation" thru "documented manner characteristic of the environment".

      I noticed that. Those are 100% consistent & implied by the parts of the standard I quoted that you are ignoring, though.

      What you're doing is:

      - Arguing is that those phrases describe the totality of the implications, rather than mere examples, without providing anything to base this method of argumentation on.

      - Completely ignoring the other phrases I quoted, which (taken at face value) contradict your reading.

      - Claiming that anyone who disagrees is being insincere(?) and reading the standard uncharitably.

      - Not even attempting to support this line of reasoning through other arguments.

      So you're not only asking people to read contradictions into the standard, but also insinuating that people who don't are not arguing in good faith. That... honestly isn't a winning strategy.

      Note that I'm not even saying your conclusion regarding their intent is necessarily wrong. I'm just saying your argument is bad. And that there is a difference between what the rules are and what some people believe their authors intended them to be.

      If I wanted to argue your position, I would look for other parts of the standard where they do what you're claiming. That is, where the literal meaning of the wording would be crazy, and which would clearly contract what everyone believes the authors of the standard intended it to mean. Then you would at least have some basis for extrapolating that line of reasoning to this paragraph. At that point you might at least get an acknowledgment from the other side that the standard is unclear and/or has a defect, even if they didn't agree with your take on what requirements it imposes as-written.

      > I don't think you could sincerely argue that this definition intends to allow the compiler to totally rewrite your code because of one guaranteed UB detected on line 5,

      I'm not sure if you're exaggerating ("totally"?), being sloppy, or misunderstanding, or if you actually mean this literally, but I already don't believe it does that, and I have never seen any compiler interpret it that way either. Sorry, but you're going to have to be more precise and pedantic here so you actually have something realistic to argue against. Right now it looks like you have an impression of UB that doesn't match reality.

Author here.

I touched on this in the "it's not about optimizations" section. It's not the compiler is out to get you. It's that you told it to do something it cannot express.

It's like if you slipped in a word in French, and not being programmed for French, it misheard the word as a false friend in English. The compiler had no way to represent the French word in it's parse tree.

So no, it's not overly legalistic. Like if the compiler knows that this hardware can do unaligned memory access, but not atomic unaligned access, should it check for alignment in std::atomic<int> ptr but not in int ptr? Probably not, right?

  • It's not that your article specifically discusses this aspect, but I think it's an important part of the conversation that's being overlooked by commentators, that we've twisted the original intent of UB and made unnecessary work for ourselves. There's been too much scaremongering about UB that's gone beyond the real concerns. If you only fear UB and don't understand it then you are worse off for trying to write safe C or C++.

The behaviour is bounded by the capability of your machine. It is unlikely that your desktop computer launches a nuclear missile, unless you worked for it to be able to do that.

> Is it just me or did compiler writers apply overly legalistic interpretation to the "no requirements" part in this paragraph?

I've (fruitlessly) had this discussion on HN before - super-aggressive optimisations for diminishing rewards are the norm in modern compilers.

In old C compilers, dereferencing NULL was reliable - the code that dereferenced NULL will always be emitted. Now, dereferencing NULL is not reliable, because the compiler may remove that and the program may fail in ways not anticipated (i.e, no access is attempted to memory location 0).

The compiler authors are on the standard, and they tend to push for more cases of UB being added rather than removing what UB there is right now (for exampel, by replacing with Implementation Defined Behaviour).