← Back to context

Comment by dwattttt

21 hours ago

I imagine it's (implicitly?) referring to avoiding whole-of-program analysis.

For example, given a declaration

  int* func(int* a);

What's the relationship between the return value and the input? You can't know without diving into 'func' itself; they could be the same pointer or it could return a freshly allocated pointer, without getting into the even more esoteric options.

Trying to solve this without recursively analysing a whole program at once is infeasible.

Rust's approach was to require more information to be provided by function definitions, but that's new syntax, and not backwards compatible, so not a palatable option for C++.

> avoiding whole-of-program analysis

Why, though?

Perhaps it's unfeasibly complex? But if that's the argument, then that's an argument that needs to be made. The paper sets out to refute the idea that C++ already has the information needed for safety analysis, but the examples throw away most of the information C++ does have, without explanation. I can't really take it seriously.

  • In general, there are three reasons to avoid whole program analysis:

    1. Complexity. This manifests as compile times. It takes much longer.

    2. Usability. Error messages are poor, because changes have nonlocal effects.

    3. Stability. This is related to 2. Without requirements expressed in the signature, changes in the body change the API, meaning keeping APIs stable is much harder.

    There’s really a simple reason why it’s not fully feasible in C++ though: C++ supports separate compilation. This means the whole program is not required to be available. Therefore you don’t have the whole program for analysis.

    • It's not even required for the information to be present at link time; C/C++ doesn't require the pointer to always be owned or not-owned, it's valid for that to be decided by configuration loaded at runtime. Or for it to be decided at random.

      Trying to establish proofs that the pointer is one way or the other can't work, because the pointer doesn't have to be only one or the other.

      The fact that you then have to treat the pointer one way or the other is a problem; if you reduce the allowed programs so that the pointer must be one of the two that's a back-compat hazard. If you don't constrain it, you need to require additional information be carried somewhere to determine how to treat it.

      If you do neither, you don't have the information needed to safely dispose of the pointer.

  • Local reasoning is the foundation of everything formal (this includes type systems) and anyone in the type-system-design space would know that. Graydon Hoare (ex-rust dev) wrote a post about it too (which links to another great without-boat's post in the very first line): https://graydon2.dreamwidth.org/312681.html

    The entire point of having a static-type-system, is to enable local reasoning. Otherwise, we would just do whole program analysis on JS instead of inventing typescript.