← Back to context

Comment by saagarjha

8 hours ago

Turning undefined behavior into implementation defined behavior is rarely a fix, though.

It's a fix that removes the most pointy part of UB.

"Going past the end of the array results in addressing arbitrary values" I can live with. "Going past the end of an array results in anything happening" is a hard sell.

  • Is that really a meaningful distinction?

    Once you are addressing arbitrary values you are firmly in the realm of "anything happening" in practice, but you've now given up optimization opportunities. As has been repeatedly demonstrated over the years, once memory safety breaks it is practically impossible to make any guarantees about program behavior.

    • Yes, it's a meaningful distinction. No you are not into "anything happening" in practice.

      Your compiler emitting a load operation and it failing isn't "anything". The failure being handled by code that the compiler authors can't predict doesn't make it "anything".

      And if you lose optimization opportunities because of this it's because your optimization is broken. By the way, if you lose optimization opportunities because of this, that means both codes are meaningfully different and you knew it all the time.

  • I think it’s a really easy sell, actually: if you go past the end of the array far enough you end up accessing the stack which includes parts of the program like “where does this function return to” or “what is the index used to perform this access” or “there is no page mapped there”. None of these are arbitrary values.

    • The "anything can happen" means that the compiler can simply silently refuse to emit the code does the access.

      Documenting that the instructions to access will always be eliminated makes it easier to predict what will happen.

      6 replies →

  • Are you talking about creating a pointer (more than one item) past an array, or dereferencing that pointer? Both are currently UB.

    For the former, I kinda get it. It may need to be there for cases like with segmented address space where p+10 could actually be a value less than p, for the eventually generated assembly. Maybe it should be fine to create such a pointer, but have it be "indeterminate value" or whatever, if you try to compare that pointer to anything? I don't know enough about compiler internals to say one way or the other.

    Dereferencing, though, can only be UB. There may not be a "value" behind that address. There may be a motor that's been I/O mapped, or a self destruct button.

    • I'm not saying that the result of the dereference be known, I'm saying that the instructions to do the dereference be always emitted.

      Right now, if a dereference results in UB, the compiler may omit it entirely.

      3 replies →