← Back to context

Comment by uecker

1 day ago

You are right, I wasn't thinking straight. I do not fully agree. Creating arbitrary pointers can not work. Forging pointers to implementation-defined memory region would be ok though.

Why can't it work though?

And I'm taking about both things.

Integer arithmetic that produces pointers that are just out of bounds of an object. Why can't this work? Why can't the compiler assume that, since I explicitly converted a pointer to an integer, the pointed-to object can't be put into a register, or made to go out or scope early?

Second, fabricating pointers. If I have a pointer to mmap/sbrk memory, shouldn't I be allowed to “fabricate” arbitrary pointers from integers that point into that area? If not, why not?

Finally Wasm. The linear memory is addressable from address 0 to __builtin_wasm_memory_size * PAGESIZE. Given this, and except maybe the address at zero, why should it be undefined behavior to dereference any other address?

What's the actual advantage to making these undefined behavior? What to we gain in return?

  • In practice if you do a volatile read at an arbitrary mapped address it will work. But you have no guarantee regarding what you will read from it, even it if it happens to match the address of a variable you just wrote into.

    Formally it is undefined, as there is no way to give it sane semantics and it will definitely trip sanitizers and similar memory safety tools.

    • So how can you implement an allocator in the language itself? Not even talking about malloc (which gets to have language blessed semantics); say an arena allocator.

      You get a bunch of memory from mmap. There are no “objects” (in C terminology) in there, and a single “provenance” (if at all, you got it from a syscall, which is not part of the language).

      If arbitrary integer/pointer math inside that buffer is not OK, how do to get heterogeneous allocations from the arena to work? When do slices of it become “objects” and gain any other ”provenance”?

      Is C supposed to be the language you can't write an arena allocator in (or a conservative GC, or…)?

      2 replies →

  • Integer arithmetic working correctly, converting a pointer to an integer, and converting an integer back to the same object is something which should work. This is what we made sure to guarantee in the provenance TS. Making this work for memory outside of any object or converting back to a different object whose address was not converted to an integer previously is difficult. It can be made to work, but then you need to give up a lot (and rewrite your compilers). Outside of an object is clear, because there might be no memory mapped. Converting back to an arbitrary object does not work because the compiler must know somehow that it can not put the object into a register. If you allow conversion back to arbitrary objects, it can not put anything into registers. This would be bad.

    Fabricating pointers in an implementation-defined way is obviously ok. This would cover mmap / sbrk or mapped I/O. Note also that historically, the C standard left things UB exactly so that implementations can use it for extensions (e.g. mmap). The idea that UB == invalid program is fairly recent misinformation, but we have to react to it and make things at least implementation-defined (which also meant a bit something else before).