Comment by Sharlin
11 hours ago
If we assume that the amount of training data matters at least a bit (which is a very reasonable assumption), I wouldn’t immediately discard the functional hypothesis. Scala’s score is almost equal to Java’s even though there’s probably something like two orders of magnitude less Scala than Java code in the wild. Similarly with C# and Racket.
Yep I think you can reasonably argue that immutability + strong conventions are the most important dimensions (as opposed to FP vs. OOP, as much as I like FP and dislike OOP):
Immutable by convention + Strong conventions: 91.3% - Elixir 97.5%, Kotlin 90.5%, Racket 88.9%, C# 88.4%
Immutable by convention + Fragmented: 78.4% - Scala 78.4% (n=1)
Mutable + Strong conventions: 77.5% - Ruby 81.0%, Swift 78.5%, Julia 78.5%, Dart 78.0%, Go 71.7%
Mutable + Fragmented: 67.9% - Java 80.9%, R 75.8%, C++ 75.8%, Shell 72.9%, Python 65.3%, Perl 64.5%, TS 61.3%, JS 60.9%, PHP 53.8%
(my grouping is somewhat subjective)
I agree with you, but, from the article: "The amount of training data doesn’t matter as much as we thought. Functional paradigms transfer well"
Anyway, I tend to think you are right, and the article is wrong in that sentence. (Or I misinterpreted something?)
I think both the quantity and quality of that has a big influence in the results.
I took that to mean ≈ "Amount of training data isn't the big factor dwarfing all else." Depends who "we" refers to, I guess. Back when LLM-generated code was new, I definitely saw predictions that LLMs would struggle with niche or rarely used languages. These days, consensus among colleagues within earshot is that LLMs handle Rust much better than Python or C++ (corpus size and AutoCodeBench scores notwithstanding).