Comment by ameliaquining
5 months ago
One thing I appreciated about this post, unlike a lot of AI-skeptic posts, is that it actually makes a concrete falsifiable prediction; specifically, "LLMs will never manage to deal with large code bases 'autonomously'". So in the future we can look back and see whether it was right.
For my part, I'd give 80% confidence that LLMs will be able to do this within two years, without fundamental architectural changes.
"Deal with" and "autonomously" are doing a lot of heavy lifting there. Cursor already does a pretty good job indexing all the files in a code base in a way that lets it ask questions and get answers pretty quickly. It's just a matter of where you set the goalposts.
Cursor fails miserably for me even just trying to replace function calls with method calls consistently, like I said in the post. This I would hope is fixable. By dealing autonomously I mean "you don't need a programmer - a PM talks to an LLM and that's how the code base is maintained, and this happens a lot (rather than on one or two famous cases where it's pretty well known how they are special and different from most work)"
By "large" I mean 300K lines (strong prediction), or 10 times the context window (weaker prediction)
I don't shy away from looking stupid in the future, you've got to give me this much
I'm pretty sure you can do that right now in Claude Code with the right subagent definitions.
(For what it's worth, I respect and greatly appreciate your willingness to put out a prediction based on real evidence and your own reasoning. But I think you must be lacking experience with the latest tools & best practices.)
18 replies →
"LLM" as well, because coding agents are already more than just an LLM. There is very useful context management around it, and tool calling, and ability to run tests/programs, etc. Though they are LLM-based systems, they are not LLMs.
Indeed. If the LLM calls a chess engine tool behind the scenes, it would be able to play excellent chess as well.
2 replies →
This rapidly gets philosophical. If I use tools am I not handling the codebase? Are we classing LLM as tool or user in this scenario?
True, there'd be a need to operationalize these things a bit more than is done in the post to have a good advance prediction.
« autonomously » what happens when subtle updates that are not bugs but change the meaning of some features that might break the workflow on some other external parts of a client’s system ? It happens all the time and, because it’s really hard to have the whole meaning and business rules written and maintained up to date, an LLM might never be able to grasp some meaning. Maybe if instead of developing code and infrastructures, the whole industry shifts toward only writing impossibly precise spec sheets that make meaning and intent crystal clear then, maybe « autonomously » might be possible to pull off
Those spec sheets exist: they're called software.
Not exactly. It depends how software is written and if there is ADRs in the project. I had to work on projects where there was bugs because someone coded business rules in a very bad and unclear way. You move an if somewhere and something breaks somewhere else. You ask « is this condition the way it’s supposed to work or is it a bug » when software is not clear enough - and often it isn’t because we have to go fast - we ask people to confirm the rule. My point is this, amazingly written software surely works best with LLMs. That’s not the most software written for now because businesses value speed over engineering sometimes (or it’s lack of skills)
1 reply →
I don't think that statement is falsifiable until you define "deal with" and "large code bases."
How large? What does "deal" mean here? Autonomously - is that on its own whim, or at the behest of a user?
That feels like a statement that's far too loosely defined to be meaningful to me.
I work on codebases that you could describe as 'large', and you could describe some of the LLM driven work being done on them as 'autonomous' today.
> LLMs will never manage to deal with large code bases 'autonomously'
Absolutely nothing about that statement is concrete or falsifiable.
Hell, you can already deal with large code bases 'autonomously' without LLMs - grep and find and sed goes a long way!
Seems falsifiable to me? If an LLM (+harness) is fully maintaining a project, updating things when dependencies update, handling bug reports, etc., in a way that is considered decent quality by consumers of the project, then that seems like it would falsify it.
Now, that’s a very high bar, and I don’t anticipate it being cleared any time soon.
But I do think if it happened, it would pretty clearly falsify the hypothesis .
In two years there will be probably no new 'autonomous' LLMs. They will be most likely integrated into 'products', trained and designed for this. We see the beginning of it today as agents and tools.
>LLMs will never manage to deal
time to prove hypothesis: infinity years
The whole of modern science is based on the idea that we can never prove a theory about the world to be true, but that we can provide experiments which allow us to show that some theories are closer to the truth than others.
Eh, if the hypothesis remains unfalsified for longer and longer, we can have increased confidence.
Similar, Newton's laws say that bodies always stay at rest unless acted upon by a force. Strictly speaking, if a billiard ball jumps up without cause tomorrow that would disprove Newton. So we'd have to wait an infinite amount of time to prove Newton right.
However no one has to wait so long, and we found ways to express how Newton's ideas are _better_ than those of Aristotle without waiting an eternity.