Comment by menaerus

2 months ago

Yes, but my point is when things blow up how exactly do you know which unsafe block you should look into? From their statement it appears as if there's such a simple correlation between "here's your segfault" and "here's your unsafe block that caused it", and which I believe there isn't, and which is why I said there's no encapsulation, local reasoning etc.

4 comments

menaerus

Xylakant 2 months ago

But you know that it's one of them. That cuts 96% of their codebase out, even assuming you have no idea which one it could be.

aw1621107 2 months ago

> Yes, but my point is when things blow up how exactly do you know which unsafe block you should look into?

In the most general case, you don't. But again, I think that that rather misses the point the statement was trying to get at.

Perhaps a more useful framing for you would be that in the most general case the encapsulation and local reasoning here is between modules that use unsafe and everything else. In some (many? most?) cases you can further bound how much code you need to look at if/when something goes wrong since not all code in unsafe modules/functions/blocks depend on each other, but in any case the point is that you only need to inspect a subset of code when reasoning about safety invariants and/or debugging a crash.

> From their statement it appears as if there's such a simple correlation between "here's your segfault" and "here's your unsafe block that caused it",

I don't get that sense from the statement at all.

menaerus 2 months ago
> in the most general case the encapsulation and local reasoning here is between modules that use unsafe and everything else
This would be the same narrative as in, let's say, C++. Wrap the difficult and low-level memory juggling stuff into "modules", harden the API, return the references and/or smart-pointers, and then just deal with the rest of the code with ease, right? Theoretically possible but practically impossible.
First reason is that abstractions get really leaky, and they especially get really leaky in the code that demands the upmost performance. Anyone who implemented their own domain/workload specific hash-map or mutex or anything similarly foundational will understand this sentiment. Anyway, if we just have a look into the NVMe driver above, there're no "unsafe modules".
Second, and as I already argued, UB in the module library transcends into the rest of your code so I fail to understand how is it so that the dozens of unsafe sections make the reasoning or debugging any more simpler when reasoning is actually not a function of number of unsafe sections but it is the function of interactions between different parts of the code that end up touching the memory in the unsafe block in a way that it was not anticipated. This is almost always the case when dealing with undefined behavior.
> I don't get that sense from the statement at all.
It is a bit exaggerated example of mine but I do - their framing suggests ~exactly that and which is simply not true.
- aw1621107 2 months ago
  
  > This would be the same narrative as in, let's say, C++. Wrap the difficult and low-level memory juggling stuff into "modules", harden the API, return the references and/or smart-pointers, and then just deal with the rest of the code with ease, right? Theoretically possible but practically impossible.
  The difference, of course, is in the amount of automated help/enforcement provided that makes it harder/impossible to misuse said API. Just like C++ provides new functionality compared to C that makes it hard-to-impossible to misuse APIs in certain ways (RAII, stronger type system, etc.), and how C does the same compared to assembly (providing structured control flow constructs, abstracting away low-level details like calling convention/register management, etc.), Rust provides new functionality that previous widespread languages didn't. It's those additional capabilities that make previously difficult things practical.
  > so I fail to understand how is it so that the dozens of unsafe sections make the reasoning or debugging any more simpler when reasoning is actually not a function of number of unsafe sections but it is the function of interactions between different parts of the code that end up touching the memory in the unsafe block in a way that it was not anticipated.
  ...Because the way encapsulation works in practice is that only a subset of code can "touch[] the memory in the unsafe block in a way that it was not anticipated" in the first place? That's kind of the point of encapsulation!
  (I say "in practice" because Rust doesn't and can't stop you from writing unsafe APIs, but that's going to be true of any language due to Rice's Theorem, the halting problem, etc.)
  As a simple example, say you have a program with one singular unsafe block encapsulated in one single function which is intended to provide a safe API. If/when UB happens the effects can be felt anywhere, but you know the bug is within the encapsulation boundary - i.e., in the body of the function that wraps the unsafe block, even if the bug is not in the unsafe block itself (well, either that or a compiler bug but that's almost always not going to be the culprit). That certainly seems to me like it'd be easier to debug than having to reason about the entire codebase.
  This continues to scale up to multiple functions which provide either a combined or independent API to internal unsafe functionality, whole modules, etc. Sure, debugging might be more difficult than the single-function case due to the additional possibilities, but the fact remains that for most (all?) codebases the amount of code responsible for/causing UB will reside behind said boundary and is going to be a proper subset of the all the code in the project.
  And if you take this approach to the extreme, you end up with formal verification programs/theorem provers, which isolate all their "unsafe" code to a relatively small/contained trusted kernel.Even there, UB in the trusted kernel can affect all parts of compiled programs, but the point is that if/when something goes wrong you know the issue is going to be in the trusted kernel, even if you don't necessarily know precisely where.
  > It is a bit exaggerated example of mine but I do - their framing suggests ~exactly that and which is simply not true.
  I do agree that that the claim in that particular interpretation of that statement is wrong (and Rust has never offered such a correlation in the first place), but it's kind of hard to discuss beyond that if I don't interpret that sentence the same way you do :/