Comment by pron
2 months ago
> Perhaps we can start with something "dumber" like Rust or any typed program. If you want to write something correct, or you care about correctness, you should not be using dynamic languages. The most useful and used type of test is type checking.
Lean or TLA+ are to Rust/Java/Haskell's type systems what algebraic topology and non-linear PDEs are to "one potato, two potatoes". The level of "correctness" achievable with such simple type systems is so negligible in comparison to the things you can express and prove in rich formal mathematics languages that they barely leave an impression (they do make some grunt work easier, but if we're talking about a world where a machine can do the more complicated things, a little more grunt work doesn't matter).
I mean..sure, but I just want the first 80%. We don't have that. Instead, we are building kernels and infrastructure using bash scripts that who knows does what. We need a tool that is solid and rigid that LLMs can use to go through all of that.
It should be something that is familiar (so imperative style like C), easier to read (perhaps with type inference) and have strong modern type system (just give me sum type is enough for gods sake). Perhaps Python with (real) types.
But if LLMs get to the point they're smart enough to deal with the tricker aspects of programming, what makes you think they need help with the easier parts? Conversely, if they're not smart enough to deal with the trickier parts, why would a little help move the needle much? Despite trying, research has not been able to find a significant general[1] effect of language design on correctness or productivity for human programmers (at least among more-or-less high level languages; I'm not talking Java vs Assembly). We all have our preferences, and we tend to think they're universal, but it's precisely because of this bias that empirical study is needed, and it's not been conclusive.
If there's no big impact on humans, why assume there would be one for LLMs? I'm not saying that LLMs think like humans, but the default hypothesis should be that something doesn't make a big difference if there's no example in which it does. In other words, if something does not have a known effect, we shouldn't assume that it will in this case (I mean, it could, but we'll need to first find good empirical evidence for that).
[1]: Research did find some differences between TypeScript and JavaScript specifically, but that result hasn't generalised.