Comment by JPKab
6 days ago
I work at a company with approximately $1 million in revenue per engineer and multiple 10+ year old codebases.
We use agents very aggressively, combined with beads, tons of tests, etc.
You treat them like any developer, and review the code in PRs, provide feedback, have the agents act, and merge when it's good.
We have gained tremendous velocity and have been able to tackle far more out of the backlog that we'd been forced to keep in the icebox before.
This idea of setting the bar at "agents work without code reviews" is nuts.
> We have gained tremendous velocity and have been able to tackle far more out of the backlog that we'd been forced to keep in the icebox before.
Source? Proofs? It's not the first, second or even third round on this rodeo.
In other words, notto disu shittu agen.
Why are you using experience and authoritative framing about a technology we’ve been using for less than 6 months?
The person they are responding with dictated an authoritative framing that isn’t true.
I know people have emotional responses to this, but if you think people aren’t effectively using agents to ship code in lots of domains, including existing legacy code bases, you are incorrect.
Do we know exactly how to do that well, of course not, we still fruitlessly argue about how humans should write software. But there is a growing body of techniques on how to do agent first development, and a lot of those techniques are naturally converging because they work.
I think programming effectiveness is inherently tied to the useful life of software, and we will need to see that play out.
This is not to suggest that AI tools do not have value but that “I just have agents writing code and it works great!” Has yet to hit its test.
9 replies →
6 months?
I've been using LLMs to augment development since early December 2023. I've expanded the scope and complexity of the changes made since then as the models grew. Before beads existed, I used a folder of markdown files for externalized memory.
Just because you were late to the party doesn't mean all of us were.
> Just because you were late to the party doesn't mean all of us were.
It wasn't a party I liked back in 2023. I'm just repeating the same stuff I see said over and over again here, but there has been a step change with Opus 4.5.
You can still it in action now because the other models are still where Opus was at a while ago. I recently needed to make small change to script I was using. It is a tiny (50 line) script written with the help of AI's ages ago, but was subtly wrong in so many ways. It's now become clear neither the AI's (I used several and cross checked) nor myself had a clue about what we were dealing with. The current "seems to work" version was created after much blood caused by misunderstandings was spilt, exposing bugs that had to be fixed.
I asked Claude 4.6 to fix yet another misunderstanding, and the result was a patch changing the minimum number of lines to get the job done. Just reviewing such a surgical modification was far easier than doing it myself.
I gave exactly the same prompt to Gemini. The result was a wholesale rearrangement of the code. Maybe it was good, but the effort to verify that was far lager than just doing it myself. It was a very 2023 experience.
The usual 2023 experience for me was ask an AI write some greenfield code, and get a result that looked like someone had changed variable names in something they found on the web after a brief search for code that looked like it might do a similar job. If you got lucky, it might have found something that was indeed very similar, but in my case that was rare. Asking it to modify code unlike something it had seen before was like asking someone to poke your eyes with a stick.
As I said, some of the organisers of this style of party seem have gotten their act together, so now it is well worth joining their parties. But this is a newish development.
If you hired a person six months ago and in that time they'd produced a ton of useful code for your product, wouldn't you say with authoritative framing that their hiring was a good decision?
It would, but I haven’t seen that. What I’ve seen is a lot of people setting up cool agent workflows which feel very productive, but aren’t producing coherent work.
This may be a result of me using tools poorly, or more likely evaluating merits which matter less than I think. But I don’t think we can see that yet as people just invented these agent workflows and we haven’t seen it yet.
Note that the situation was not that different before LLMs. I’ve seen PMs with all the tickets setup, engineers making PRs with reviews, etc and not making progress on the product. The process can be emulated without substantive work.
Why is it always "tons of code"? Unless you are paid by the line of code writin "tons of code" makes no sense.
If there is one thing I have seen is that there is a subset of intellectual people will still be adverse to learning new tools, hang to ideological beliefs (I feel this though, watching programming as you know it die in a way, kinda makes you not want to follow it) and would prefer to just be lazy and not properly dogfood and learn their new tooling.
I'm seeing amazing result to with agents, when provided an well formed knowledge base and directed through each piece of work like its a sprint. Review and iron out scope requirements, api surface/contract, have agents create multi phase implementation plans and technical specifications in a share dev directory and to make high quality changes logs, document future consideration and any bugs/issues found that can be deferred. Every phase is addressed with a human code review along with gemini who is great at catching drift from spec and bugs in less obvious places.
While I'm sure an enterprise code base could still be an issue and would require even more direction (and opus I wont let touch java, it codes like an enterprise java greybeard who loves to create an interface/factory for everything), I think that's still just a tooling issues.
I'm not of the super pro AI camp, but having followed its development and used it throughout. For the first time I am actual amazed and bothered, and convinced if people dont embrace these tools, they will be left behind. No they dont 10-100x a jr dev, but if someone has proper domain knowledge to direct the agent, performs dual research with it to iron things out with the human actually understanding the problem space, 2-5x seems quite reasonable currently if driven by a capable developer. But this just move the work to review and documentation maintenance/crafting. Which has its own fatigue and is less rewarding for a programmers mind who loves to solve challenges and gets dopamine from it .
But given how man people are adverse...I dont think anyone who embraces it is going to have job security issues and be replaced, but here are many capable engineers who might due to their own reservations. I'm amazed by how many intelligent and capable people try llms/agents like a political straw man, there is no reasoning with them. They say vibe coding sucks (it does for anything more than a small throw away that wont be maintained), yet their examples for agents/llm not working is it can't just take a prompt and produce the best code ever and automatically and manifest the knowledge needed to work on their codebase. You still need to put in effort and learn to actually perform the engineering with the tools, but if it doesnt take a paragraph with no AGENTS.md and turn it into a feature or bug fix they are not good to them. Yeah they will get distracted and fuck up, just like if you throw 9/10 developers in the same situation and told them to get to work with no knowledge of the code base or domain and have their pr in by noon.