Comment by ncruces

1 day ago

Why can't it work though?

And I'm taking about both things.

Integer arithmetic that produces pointers that are just out of bounds of an object. Why can't this work? Why can't the compiler assume that, since I explicitly converted a pointer to an integer, the pointed-to object can't be put into a register, or made to go out or scope early?

Second, fabricating pointers. If I have a pointer to mmap/sbrk memory, shouldn't I be allowed to “fabricate” arbitrary pointers from integers that point into that area? If not, why not?

Finally Wasm. The linear memory is addressable from address 0 to __builtin_wasm_memory_size * PAGESIZE. Given this, and except maybe the address at zero, why should it be undefined behavior to dereference any other address?

What's the actual advantage to making these undefined behavior? What to we gain in return?

In practice if you do a volatile read at an arbitrary mapped address it will work. But you have no guarantee regarding what you will read from it, even it if it happens to match the address of a variable you just wrote into.

Formally it is undefined, as there is no way to give it sane semantics and it will definitely trip sanitizers and similar memory safety tools.

  • So how can you implement an allocator in the language itself? Not even talking about malloc (which gets to have language blessed semantics); say an arena allocator.

    You get a bunch of memory from mmap. There are no “objects” (in C terminology) in there, and a single “provenance” (if at all, you got it from a syscall, which is not part of the language).

    If arbitrary integer/pointer math inside that buffer is not OK, how do to get heterogeneous allocations from the arena to work? When do slices of it become “objects” and gain any other ”provenance”?

    Is C supposed to be the language you can't write an arena allocator in (or a conservative GC, or…)?

    • Handling allocators correctly it is actually quite problematic. In C++ you would placement-new into the raw storage, which ends the lifetime of whatever was there and start the lifetime of a new object, and as long as you use the pointer returned by operator new (or use std::launder), formally you are ok.

      Famously you cannot implement an allocator in C into static named storage; I understand that on anonymous memory (like that returned by sbrk or mmap, or an upstream allocator) it should work fine, but, as a C++ programmer, I'm not familiar with the specific details of the C lifetime model that allow it. I understand that stores into anonymous memory can change the dynamic type (or whatever is the C equivalent) of the object.

      In any case the issue is around object lifetimes and aliasing instead of pointer provenance: you can treat the anonymous memory is just a char array and you can safely form pointers into it and will carry the correct provenance information.

    • You don't that is why people that never bothered with ISO C legalese have this idea of several tricks being C, when they are in practice "Their C Compiler" language.

      Several things in C, if you want to stay within the guarantees of ISO C for portable code, have to be written in straight Assembly, not even inline, as the standard only defines a asm keyword must exist, leaving all other details to the implementation.

Integer arithmetic working correctly, converting a pointer to an integer, and converting an integer back to the same object is something which should work. This is what we made sure to guarantee in the provenance TS. Making this work for memory outside of any object or converting back to a different object whose address was not converted to an integer previously is difficult. It can be made to work, but then you need to give up a lot (and rewrite your compilers). Outside of an object is clear, because there might be no memory mapped. Converting back to an arbitrary object does not work because the compiler must know somehow that it can not put the object into a register. If you allow conversion back to arbitrary objects, it can not put anything into registers. This would be bad.

Fabricating pointers in an implementation-defined way is obviously ok. This would cover mmap / sbrk or mapped I/O. Note also that historically, the C standard left things UB exactly so that implementations can use it for extensions (e.g. mmap). The idea that UB == invalid program is fairly recent misinformation, but we have to react to it and make things at least implementation-defined (which also meant a bit something else before).

  • > Note also that historically, the C standard left things UB exactly so that implementations can use it for extensions (e.g. mmap). The idea that UB == invalid program is fairly recent misinformation, but we have to react to it and make things at least implementation-defined (which also meant a bit something else before).

    I'll just finish by saying: yes please. And thank you for bearing with me.