Comment by yosefk

6 months ago

Cursor fails miserably for me even just trying to replace function calls with method calls consistently, like I said in the post. This I would hope is fixable. By dealing autonomously I mean "you don't need a programmer - a PM talks to an LLM and that's how the code base is maintained, and this happens a lot (rather than on one or two famous cases where it's pretty well known how they are special and different from most work)"

By "large" I mean 300K lines (strong prediction), or 10 times the context window (weaker prediction)

I don't shy away from looking stupid in the future, you've got to give me this much

19 comments

yosefk

adastra22 6 months ago

I'm pretty sure you can do that right now in Claude Code with the right subagent definitions.

(For what it's worth, I respect and greatly appreciate your willingness to put out a prediction based on real evidence and your own reasoning. But I think you must be lacking experience with the latest tools & best practices.)

yosefk 6 months ago
If you're right, there will soon be a flood of software teams with no programmers on them - either across all domains, or in some domains where this works well. We shall see.
Indeed I have no experience with Claude Code, but I use Claude via chat, and it fails all the time on things not remotely as hard as orientation in a large code base. Claude Code is the same thing with the ability to run tools. Of course tools help to ground its iterations in reality, but I don't think it's a panacea absent a consistent ability to model the reality you observe thru your use of tools. Let's see...
- boxed 6 months ago
  
  I was very skeptical of Claude Code but was finally convinced to try it and it does feel very different to use. I made three hobby projects in a weekend that I had pushed up for years due to "it's too much hassle to get started" inertia. Two of the projects it did very well with, the third I had to fight with it and it still is subtly wrong (swiftUI animations and claude code seemingly is not a good mix!)
  That being said, I think your analysis is 100% correct. LLMs are fundamentally stupid beyond belief :P
  
  2 replies →
- Vegenoid 6 months ago
  
  I am more skeptical of the rate of AI progress than many here, but Claude Code is a huge step. There were a few "holy shit" moments when I started using it. Since then, after much more experimentation, I see its limits and faults, and use it less now. But I think it's worth giving it a try if you want to be informed about the current state of LLM-assisted programming.
- adastra22 6 months ago
  
  > Indeed I have no experience with Claude Code, but I use Claude via chat...
  These are not even remotely similar, despite the name. Things are moving very fast, and the sort of chat-based interface that you describe in your article is already obsolete.
  Claude is the LLM model. Claude Code is a combination of internal tools for the agent to track its goals, current state, priorities, etc., and a looped mechanism for keeping it on track, focused, and debugging its own actions. With the proper subagents it can keep its context from being poisoned from false starts, and its built-in todo system keeps it on task.
  Really, try it out and see for yourself. It doesn't work magic out of the box, and absolutely needs some hand-holding to get it to work well, but that's only because it is so new. The next generation of tooling will have these subagent definitions auto selected and included in context so you can hit the ground running.
  We are already starting to see a flood of software coming out with very few active coders on the team, as you can see on the HN front page. I say "very few active coders" not "no programmers" because using Claude Code effectively still requires domain expertise as we work out the bugs in agent orchestration. But once that is done, there aren't any obvious remaining stumbling blocks to a PM running a no-coder, all-AI product team.
  
  8 replies →
alfalfasprout 6 months ago
FWIW I do work with the latest tools/practices and completely agree with OP. It's also important to contextualize what "large" and "complex" codebases really mean.
Monorepos are large but the projects inside may, individually, not be that complex. So there are ways of making LLMs work with monorepos well (eg; providing a top level index of what's inside, how to find projects, and explaining how the repo is set up). Complexity within an individual project is something current-gen SOTA LLMs (I'm counting Sonnet 4, Opus 4.1, Gemini 2.5 Pro, and GPT-5 here) really suck at handling.
Sure, you can assign discrete little tasks here and there. But bigger efforts that require not only understanding how the codebase is designed but also why it's designed that way fall short. Even more so if you need them to make good architectural decisions on something that's not "cookie cutter".
Fundamentally, I've noticed the chasm between those that are hyper-confident LLMs will "get there soon" and those that are experienced but doubtful depends on the type of development you do. "ticket pulling" type work generally has the work scoped well enough that an LLM might seem near-autonomous. More abstract/complex backend/infra/research work not so much. Still value there, sure. But hardly autonomous.
- adastra22 6 months ago
  
  Could, e.g., a custom-made 100ktoken summary of the architecture and relevant parts of the giant repo and base index of where to find more info be sufficient to allow Opus to take a large task and split it into small enough subprojects that are farmed out to Sonnet instances with sufficient context?
  This seems quite doable with even a small amount of tooling around Claude Code, even though I agree it doesn't have this capability out of the box. I think a large part of this gulf is "it doesn't work out of the box" vs "it can be made to work with a little customization."
bootsmann 6 months ago
I feel like refutations like this (you aren't using the tool right | you should try this other tool) pop up often but are fundamentally worthless because as long as you're not showing code you might as well be making it up. The blog post gives examples of clear failures that can be reproduced by anyone by themselves, I think its time vibe code defenders are held to the same standard.
- adastra22 6 months ago
  
  The very first example is that LLMs lose their mental model of chess when playing a game. Ok, so instead ask Claude Code to design an MCP for tracking chess moves, and vibe code it. That’s the very first thing that comes to mind, and I expect it would work well enough.