Comment by g947o

6 days ago

None of those wild experiments are running on a "real", existing codebase that is more than 6 months old. The thing they don't talk about is that nobody outside these AI companies wants to vibe code with a 10 year old codebase with 2000 enterprise customers.

As you as you start to work with a codebase that you care about and need to seriously maintain, you'll see what a mess these agents make.

60 comments

g947o

GoatInGrey 6 days ago

Even on codebases within the half-year age group, these LLMs often do perform nasty (read: ungodly verbose) implementations that become a maintainability nightmare. Even for the LLMs that wrote it all in the first place. I know this because we've had a steady trickle of clients and prospects expressing "challenges around maintainability and scalability" as they move toward "production readiness". Of course, asking if we can implement "better performing coding agents". As if improved harnessing or similar guardrails can solve what is in my view, a deeper problem.

The practical and opportunistic response is too tell them "Tough cookies" and watch the problems steadily compound into more lucrative revenue opportunities for us. I really have no remorse for these people. Because half of them were explicitly warned against this approach upfront but were psychologically incapable of adjusting expectations or delaying LLM deployment until the technology proved itself. If you've ever had your professional opinion dismissed by the same people regarding you as the SME, you understand my pain.

I suppose I'm just venting now. While we are now extracting money from the dumbassery, the client entitlement and management of their emotions that often comes with putting out these fires never makes for a good time.

buschleague 5 days ago
This is exactly why enforcement needs to be architectural. The "challenges around maintainability and scalability" your clients hit exist because their AI workflows had zero structural constraints. The output quality problem isn't the model, it's the lack of workflow infrastructure around it.
- datsci_est_2015 5 days ago
  
  Is this not just “build a better prompt” in more words?
  At what point do we realize that the best way to prompt is with formal language? I.e. a programming language?
  
  5 replies →

krastanov 6 days ago

I maintain serious code bases and I use LLM agents (and agent teams) plenty -- I just happen to review the code they write, I demand they write the code in a reviewable way, and use them mostly for menial tasks that are otherwise unpleasant timesinks I have to do myself. There are many people like me, that just quietly use these tools to automate the boring chores of dealing with mature production code bases. We are quiet because this is boring day-to-day work.

E.g. I use these tools to clean up or reorganize old tests (with coverage and diff viewers checking of things I might miss), update documentation with cross links (with documentation linters checking for errors I miss), convert tests into benchmarks running as part of CI, make log file visualizers, and many more.

These tools are amazing for dealing with the long tail of boring issues that you never get to, and when used in this fashion they actually abruptly increase the quality of the codebase.

g947o 6 days ago
It's not called vibe coding then.
- jmalicki 6 days ago
  
  Oh you made vibe coding work? Well then it's not vibe coding.
  But any time someone mentions using AI without proof of success? Vibe coding sucks.
  
  6 replies →
peyton 6 days ago
Yeah esp. the latest iterations are great for stuff like “find and fix all the battery drainers.” Tests pass, everyone’s happy.
- hp197 6 days ago
  
  (rhetorical question) You work at Apple? :p

JPKab 6 days ago

I work at a company with approximately $1 million in revenue per engineer and multiple 10+ year old codebases.

We use agents very aggressively, combined with beads, tons of tests, etc.

You treat them like any developer, and review the code in PRs, provide feedback, have the agents act, and merge when it's good.

We have gained tremendous velocity and have been able to tackle far more out of the backlog that we'd been forced to keep in the icebox before.

This idea of setting the bar at "agents work without code reviews" is nuts.

otabdeveloper4 5 days ago

> We have gained tremendous velocity and have been able to tackle far more out of the backlog that we'd been forced to keep in the icebox before.
Source? Proofs? It's not the first, second or even third round on this rodeo.
In other words, notto disu shittu agen.
groundzeros2015 6 days ago
Why are you using experience and authoritative framing about a technology we’ve been using for less than 6 months?
- kasey_junk 6 days ago
  
  The person they are responding with dictated an authoritative framing that isn’t true.
  I know people have emotional responses to this, but if you think people aren’t effectively using agents to ship code in lots of domains, including existing legacy code bases, you are incorrect.
  Do we know exactly how to do that well, of course not, we still fruitlessly argue about how humans should write software. But there is a growing body of techniques on how to do agent first development, and a lot of those techniques are naturally converging because they work.
  
  10 replies →
- JPKab 6 days ago
  
  6 months?
  I've been using LLMs to augment development since early December 2023. I've expanded the scope and complexity of the changes made since then as the models grew. Before beads existed, I used a folder of markdown files for externalized memory.
  Just because you were late to the party doesn't mean all of us were.
  
  3 replies →
- dboreham 6 days ago
  
  If you hired a person six months ago and in that time they'd produced a ton of useful code for your product, wouldn't you say with authoritative framing that their hiring was a good decision?
  
  2 replies →
hickelpickle 6 days ago

If there is one thing I have seen is that there is a subset of intellectual people will still be adverse to learning new tools, hang to ideological beliefs (I feel this though, watching programming as you know it die in a way, kinda makes you not want to follow it) and would prefer to just be lazy and not properly dogfood and learn their new tooling.
I'm seeing amazing result to with agents, when provided an well formed knowledge base and directed through each piece of work like its a sprint. Review and iron out scope requirements, api surface/contract, have agents create multi phase implementation plans and technical specifications in a share dev directory and to make high quality changes logs, document future consideration and any bugs/issues found that can be deferred. Every phase is addressed with a human code review along with gemini who is great at catching drift from spec and bugs in less obvious places.
While I'm sure an enterprise code base could still be an issue and would require even more direction (and opus I wont let touch java, it codes like an enterprise java greybeard who loves to create an interface/factory for everything), I think that's still just a tooling issues.
I'm not of the super pro AI camp, but having followed its development and used it throughout. For the first time I am actual amazed and bothered, and convinced if people dont embrace these tools, they will be left behind. No they dont 10-100x a jr dev, but if someone has proper domain knowledge to direct the agent, performs dual research with it to iron things out with the human actually understanding the problem space, 2-5x seems quite reasonable currently if driven by a capable developer. But this just move the work to review and documentation maintenance/crafting. Which has its own fatigue and is less rewarding for a programmers mind who loves to solve challenges and gets dopamine from it .
But given how man people are adverse...I dont think anyone who embraces it is going to have job security issues and be replaced, but here are many capable engineers who might due to their own reservations. I'm amazed by how many intelligent and capable people try llms/agents like a political straw man, there is no reasoning with them. They say vibe coding sucks (it does for anything more than a small throw away that wont be maintained), yet their examples for agents/llm not working is it can't just take a prompt and produce the best code ever and automatically and manifest the knowledge needed to work on their codebase. You still need to put in effort and learn to actually perform the engineering with the tools, but if it doesnt take a paragraph with no AGENTS.md and turn it into a feature or bug fix they are not good to them. Yeah they will get distracted and fuck up, just like if you throw 9/10 developers in the same situation and told them to get to work with no knowledge of the code base or domain and have their pr in by noon.

rco8786 6 days ago

That is also my experience. Doesn't even have to be a 10 year old codebase. Even a 1 year old codebase. Any one that is a serious product that is deployed in production with customers who rely on it.

Not to say that there's no value in AI written code in these codebases, because there is plenty. But this whole thing where 6 agents run overnight and "tada" in the morning with production ready code is...not real.

zerkten 6 days ago

I don't believe that devs are the audience. They are pushing this to decision makers where they want them to think that the state of the art is further ahead than it is. These folks then think about how helpful it'd be to have 20% of that capability. When there is so much noise in the market, and everyone seems to be overtaking everyone else it, this kind of approach is the only one that gets attention.
Similarly, a lot of the AGI-hype comments exist to expand the scope of the space. It's not real, but it helps to position products and win arguments based on hypotheticals.

pjc50 6 days ago

Also anything that doesn't look like a SaaS app does very badly. We had an internal trial at embedded firmware and concluded the results were unsalvageably bad. It doesn't help that the embedded environment is very unfriendly to standard testing techniques, as well.

joquarky 5 days ago

You will need to build an accessible knowledge base for the topics for which the models have not had extensive training.
Proprietary embedded system documentation is not exactly ubiquitous. You must provide reference material and guardrails where the training is weakest.
This applies to everything in ML: it will be weakest at the edges.

JeremyNT 6 days ago

I feel like you could have correctly stated this a few months ago, but the way this is "solved" is by multiple agents that babysit each other and review their output - it's unreasonably effective.

You can get extremely good results assuming your spec is actually correct (and you're willing to chew through massive quantities of tokens / wait long enough).

nicoburns 5 days ago
> You can get extremely good results assuming your spec is actually correct
Is it ever the case that the spec is entirely correct (and without underspecified parts)? I thought the reason we write code is because it's much easier to express a spec as code than it is to get a similar level of precision in prose.
- JeremyNT 5 days ago
  
  I think this is basically the only SWE-type job that exists beyond the (relatively near) future: finding the right spec and feeding it to the bots. And in this way I think even complete laypeople will be able to create software using the bots, but you'd still want somebody with a deeper understanding in this role for serious projects.
  The bots even now can really help you identify technical problems / mistakes / gaps / bad assumptions, but there's no replacing "I know what the business wants/needs, and I know what makes my product manager happy, and I know what 'feels' good" type stuff.
  
  1 reply →
ldng 6 days ago
And unreasonably expensive unless you are Big Corp. Die startups, die. Welcome to our Cyberpunk overlords.
- whateveracct 6 days ago
  
  Companies will just shift money from salaries to their Anthropic bill - what's the problem?
  
  1 reply →
- JeremyNT 5 days ago
  
  Or hey, the VCs can self-deal by funding new startups that buy bot time from AI firms the same VCs already fund.
  No pesky developers siphoning away equity!

flemhans 5 days ago

My Claude Code has been running weeks on end churning through a huge task list almost unattended on a complex 15 yr old code base, auto-committing thousands of features. It is high quality code that will go live very soon.

rco8786 5 days ago

> that will go live very soon.
I hope that you are successful!
oblio 5 days ago

Awesome! Which application or service?

lanstin 5 days ago

The gas town discord has two people that are doing transformation of extremely legacy in house Java frameworks. Not reporting great success yet but also probably work that just wouldn’t be done otherwise.

democracy 5 days ago

Oh that means you don't know how to use AI properly. Also its only 2026 imagine what AI agents can do in a few years /s