Comment by berdon

20 hours ago

There is a guaranteed cap on how far LLM based AI models can go. Models improve by being trained on better data. LLMs being used to generate millions of lines of sloppy code will substantially dilute the pool of good training data. Developers moving over to AI based development will cease to grow and learn - producing less novel code.

The massive increase in slop code and loss of innovation in code will establish an unavoidable limit on LLMs.

Maybe we'll train the llms in our ways of using them, and the next generation of coding assistants will be another layer inbetween us and the code. You talk to the chief engineer llm who in turn talks to its cadre of claude code instances running in virtual tmux. \hj?

I think most of the progress is training by reinforcement learning on automated assessments of the code produced. So data is not really an issue.

But they're not just training off code and its use, but off a corpus general human knowledge in written form.

I mean, in general not only do they have all of the crappy PHP code in existence in their corpus but they also have Principia Mathematica, or probably The Art of Computer Programming. And it has become increasingly clear to me that the models have bridged the gap between "autocomplete based on code I've seen" to some sort of distillation of first order logic based on them just reading a lot of language... and some fuzzy attempt at reasoning that came out of it.

Plus the agentic tools driving them are increasingly ruthless at wringing out good results.

That said -- I think there is a natural cap on what they can get at as pure coding machines. They're pretty much there IMHO. The results are usually -- I get what I asked for, almost 100%, and it tends to "just do the right thing."

I think the next step is actually to actually make it scale and make it profitable but also...

fix the tools -- they're not what I want as an engineer. They try to take over, and they don't put me in control, and they create a very difficult review and maintenance problem. Not because they make bad code but because they make code that nobody feels responsible for.

That is a naive assumption. Or rather multiple naive assumptions: Developers mostly don’t move over to AI development, but integrate it into their workflow. Many of them will stay intellectually curious and thus focus their attention elsewhere; I’m not convinced they will just suddenly all stagnate.

Also, training data isn’t just crawled text from the internet anymore, but also sourced from interactions of millions of developers with coding agents, manually provided sample sessions, deliberately generated code, and more—there is a massive amount of money and research involved here, so that’s another bet I wouldn’t be willing to make.