← Back to context

Comment by uecker

7 hours ago

One has to add that from the 218 UB in the ISO C23, 87 are in the core language. From those we already removed 26 and are in progress of removing many others. You can find my latest update here (since then there was also some progress): https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3529.pdf

A lot of that work is basically fixing documentation bugs, labelled "ghosts" in your text. Places where the ISO document is so bad as a description of C that you would think there's Undefined Behaviour but it's actually just poorly written.

Fixing the document is worthwhile, and certainly a reminder that WG21's equivalent effort needs to make the list before it can even begin that process on its even longer document, but practical C programmers don't read the document and since this UB was a "ghost" they weren't tripped by it. Removing items from the list this way does not translate to the meaningful safety improvement you might imagine.

There's not a whole lot of movement there towards actually fixing the problem. Maybe it will come later?

  • > practical C programmers don't read the document and since this UB was a "ghost" they weren't tripped by it

    I would strongly suspect that C compiler implementers very much do read the document, though. Which, as far as I can see, means "ghosts" could easily become actual UB (and worse, sneaky UB that you wouldn't expect.)

    • If I understand correctly, the "ghosts" are vacuously UB. As in, the standard specifies that if X, then UB, but X can in fact never be true according to the standard.

    • The previous language might cause a C compiler developer to get very confused because it seems as though they can choose something else but what it is isn't specified, but almost invariably eventually they'll realise oh, it's just badly worded and didn't mean "should" there.

      It's like one of those tricky self-referential parlor box statements. "The statement on this box is not true"? Thanks I guess. But that's a game, the puzzles are supposed to be like that, whereas the mission of the ISO document was not to confuse people, so it's good that it is being improved.

      1 reply →

  • Fixing the actual problems is work-in-progress (as my document also indicates), but naturally it is harder.

    But the original article also complains about the number of trivial UB.

And yet, I see P1434R0 seemingly trying to introduce new undefined behavior, around integer-to-pointer conversions, where previously you had reasonably sensible implementation defined behavior (the conversions “are intended to be consistent with the addressing structure of the execution environment").

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14...

  • Pointer provenance already existed before, but the standards were contradictory and incomplete. This is an effort to more rigorously nail down the semantics.

    i.e., the UB already existed, but it was not explicit had to be inferred from the whole text and the boundaries were fuzzy. Remember that anything not explicitly defined by the standard, is implicitly undefined.

    Also remember, just because you can legally construct a pointer it doesn't mean it is safe to dereference.

    • The current standard still says integer-to-pointer conversions are implementation defined (not undefined) and furthermore "intended to be consistent with the addressing structure of the execution environment" (that's a direct quote).

      I have an execution environment, Wasm, where doing this is pretty well defined, in fact. So if I want to read the memory at address 12345, which is within bounds of the linear memory (and there's a builtin to make sure), why should it be undefined behavior?

      And regarding pointer provenance, why should going through a pointer-to-integer and integer-to-pointer conversions try to preserve provenance at all, and be undefined behavior in situations where that provenance is ambiguous?

      The reason I'm using integer (rather than pointer) arithmetic is precisely so I don't have to be bound by pointer arithmetic rules. What good purpose does it serve for this to be undefined (rather than implementation defined) beyond preventing certain programs to be meaningfully written at all?

      I'm genuinely curious.

      2 replies →

    • Pointer provenance was certainly not here in the 80s. That's a more modern creation seeking to extract better performance from some applications at a cost of making others broken/unimplementable.

      It's not something that exists in the hardware. It's also not a good idea, though trying to steer people away from it proved beyond my politics.

      13 replies →