← Back to context

Comment by dmitrygr

11 hours ago

I stoped reading about here:

    > bool parse_packet(const uint8_t* bytes) {
    >   const int* magic_intp = (const int*)bytes;   // UB!

Author, if you are reading this, please cite the spec section explaining that this is UB. Dereferencing the produced pointer may be UB, but casting itself is not, since uint8_t is ~ char and char* can be cast to and from any type.

you might try to argue that uint8_t is not necessarily char, and while it is true that implementations of C can exist where CHAR_BIT > 8, but those do not have uint8_t defined (as per spec), so if you have uint8_t, then it is "unsigned char", which makes this cast perfectly safe and defined as far as i can tell. Of course CHAR_BIT is required to be >= 8, so if it is not >8, it is exactly 8. (In any case, whether uint8_t is literally a typedef of unsigned char is implementation-defined and not actually relevant to whether the cast itself is valid -- it is)

The issue is not type punning (itself a very common source of UB), but the fact that the `bytes` pointer might not be int-aligned. The spec is clear that the creation (not just the dereferencing) of an unaligned pointer is UB, see 6.3.2.3 paragraph 7 of the C11 (draft) spec.

Of course, this exchange just demonstrates the larger point, that even a world-class expert in low level programming can easily make mistakes in spotting potential UB.

  • > Of course, this exchange just demonstrates the larger point, that even a world-class expert in low level programming can easily make mistakes in spotting potential UB.

    A "world-class expert in low level programming" knows that unaligned memory accesses are no problem anymore on most modern CPUs, and that this particular UB in the C standard is bogus and needs to fixed ;)

    • … it’s only UB if the pointer is actually misaligned. It’s not possible to tell from these two lines whether that’s the case.

  • C of course is ancient. It remembers the Cambrian explosion of CPU architectures, twelve-bit bytes and everything like that. I wonder if it is possible to codify some pragmatic subset of it that works nicely on currently available CPUs. Cause the author of the piece goes back in time to prove his point (SPARCs and Alphas).

  • That cast is valid. Spec does not guarantee same bit sequence for resulting pointer and source pointer. But as the cast is explicitly allowed, it is not UB. Compiler is free to round the pointer down. Or up. Or even sideways. All ok. Dereferencing it — indeed not ok. But the cast is explicitly allowed and not UB.

    Pointer casts changing pointer bit sequences is common on weird platforms (eg: some TI DSPs, PIC, and aarch64+PAC). And it is valid as per spec. Pointer assignment is not required to be the same as memcpy-ing the pointer unto a pointer to another type.

    You misunderstood the spec. No promises are made that that cast copies the pointer bit for bit (and thus creates an invalid pointer). Therefore, your objection to invalid pointers is null and void. :)

    • I'm not assuming anything about bit representations. In this case, the spec language is quite clear and unambiguous.

      6.3.2.3 paragraph 7: A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned[footnote 68]) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

      This is a subsection of section 6.3 which describes conversions, which include both implicit and conversions from a cast operation. This language is not saying anything about bit representations or derefencing.

      I happen to be wearing my undefined behavior shirt at the moment, which lends me an extra layer of authority. I'm at RustWeek in Utrecht, and it's one of my favorite shirts to wear at Rust conferences. But let's say for the sake of argument that you are right and I am indeed misunderstanding the spec. Then the logical conclusion is that it's very difficult for even experienced programmers to agree on basic interpretations of what is and what isn't UB in C.

      7 replies →

Author here.

> A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned71) for the referenced type, the behavior is undefined.

C23 6.3.2.3p7.

Byte and int has different alignment requirements. It is UB the moment you make such a ptr.

Great way to demonstrate the point of the article.

  • That better be marked "historical". At least, Lemire says:

    On recent Intel and 64-bit ARM processors, data alignment does not make processing a lot faster. It is a micro-optimization. Data alignment for speed is a myth. // https://lemire.me/blog/2012/05/31/data-alignment-for-speed-m...

    (while in the olden days, a program may crash on unaligned access, esp on RISC)

  • Without memcpy there is no guarantee that that line produces an invalid pointer

    I don’t see what spec part would prohibit that cast from validly compiling to

       BIC r3, r0, #3
    

    Spec only guaranteed round-trip through char* of properly aligned for type pointers. This doesn’t break that.