Comment by jchw

2 months ago

I wonder if some day we'll look back differently on the "billion-dollar mistake" thing. The key problem with null references is that it forces you to either check any given reference to see if it's null if you don't already know, or you would have to have a contract that it can't be null. Not having null references really does solve that problem, but still in every day programs you often wind up with situations where you actually can know from the outside that some function will return a non-empty value, but the function itself is unable to make that guarantee in a way that the compiler can enforce it; in those cases, you have no choice but to face the same dilemma. In Rust this situation plays out with `unwrap()`, which in practice most reasonably-sized codebases will end up with some. You could always forbid it, but this is only somewhat of an improvement because in a lot of cases there also isn't anything logical to do once that invariant hasn't held. (Though for critical production workloads, it is probably a good idea to try to find something else to do other than let the program entirely crash in this event, even if it's still potentially an emergency.)

In other words: after all this time, I feel that Tony Hoare framing null references as the billion-dollar mistake may be overselling it at least a little. Making references not nullable by default is an improvement, but the same problem still plays out so as long as you ever have a situation where the type system is insufficient to be able to guarantee the presence of a value you "know" must be there. (And even with formal specifications/proofs, I am not sure we'll ever get to the point where is always feasible to prove.) The only real question is how much of the problem is solved by not having null references, and I think it's less than people acknowledge.

(edit: Of course, it might actually be possible to quantify this, but I wasn't able to find publicly-available data. If any organization were positioned to be able to, I reckon it would probably be Uber, since they've developed and deployed both NullAway (Java) and NilAway (Go). But sadly, I don't think they've actually published any information on the number of NPEs/panics before and after. My guess is that it's split: it probably did help some services significantly reduce production issues, but I bet it's even better at preventing the kinds of bugs that are likely to get caught pre-production even earlier.)

4 comments

jchw

tialaramex 2 months ago

I think Hoare is bang on because we know the only similar values in many languages are also problematic even though they're not related to memory.

The NaNs are, as their name indicates, not numbers. So the fact this 32-bit floating point value parameter might be NaN, which isn't even a number, is as unhelpful as finding that the Goose you were passed as a parameter is null (ie not actually a Goose at all)

There's a good chance you've run into at least one bug where oops, that's NaN and now the NaN has spread and everything is ruined.

The IEEE NaNs are baked into the hardware everybody uses, so we'll find it harder to break away from this situation than for the Billion Dollar Mistake, but it's clearly not a coincidence that this type problem occurs for other types, so I'd say Hoare was right on the money and that we're finally moving in the correct direction.

jchw 2 months ago

What I'm saying is, I disagree that "we know" these things. We know that there are bugs that can be statically prevented by having non-nullable types to enforce contracts, but that doesn't in itself make null the actual source of the problem.
A language with non-nullability-by-default in its reference types is no worse than a language with no null. I say this because, again, there will always be situations where you may or may not have a value. For example, grabbing the first item in a list; the list may be empty. Even if you "know" the list contains at least one item, the compiler does not. Even if you check the invariant to ensure that it is true, the case where it is false may be too broken to handle and thus crashing really is the only reasonable thing to do. By the time the type system has reached its limits, you're already boned, as it can't statically prevent the problem. It doesn't matter if this is a nullable reference or if its an Option type.
Because of that, we're not really comparing languages that have null vs languages that don't. We're comparing languages that have references that can be non-nullable (or functionally equivalent: references that can't be null, but optional wrapper types) versus languages that have references that are always nullable. "Always nullable" is so plainly obviously worse that it doesn't warrant any further justification, but the question isn't whether or not it's worse, it's how much worse.
Maybe not a billion dollars worse after all.
P.S.: NaN is very much the same. It's easy to assign blame to NaN, and NaN can indeed cause problems that wouldn't have existed without it. However, if we had signalling NaNs by default everywhere, I strongly suspect that we would still curse NaN, possibly even worse. The problem isn't really NaN. It's the thing that makes NaN necessary to begin with. I'm not defending null as in trying to suggest that it isn't involved in causing problems, instead I'm suggesting that the reasons why we still use null are the true root issue. You really do fix some problems by killing null, but the true root issue still exists even after.

grumpyprole 2 months ago

It's overblown until it isn't. Hoare didn't pluck that number from thin air. This is now a solved problem in modern programming languages. If Odin doesn't have this and other essential memory safety features, it's certainly not worth the massive retooling effort.

jchw 2 months ago

But that's not really what I mean. What I am trying to say is that "solving" the problem doesn't really fully solve it after all.