Comment by loire280
16 hours ago
I've seen engineers I respect abandon this way of working as a team for the productivity promise of conjuring PRs with a coding agent. It blows away years of trust so quickly when you realize they stopped reviewing their own output.
Perhaps due to FOMO outbreak[1], upper management everywhere has demanded AI-powered productivity gains, based on LoC/PR metrics, it looks like they are getting it.
1. The longer I work in this industry, the more it becomes clear that CxO's aren't great at projecting/planning, and default to copy-cat, herd behaviors when uncertain.
Would love to be a fly on the wall for a couple of months to see what corporate CxO's actually do.
Surely I could do a mediocre job as a CxO by parroting whatever is hot on Linkedin. Probably wouldn't be a massively successful one, but good enough to survive 2 years and have millions in the bank for that, or get fired and get a golden parachute.
(half) joking - most likely I'm massively trivializing the role.
Funny enough, the author of this blog post wrote another one on exactly that topic, entitled "What do executives do, anyway?"[1]. If you read it, you'll find it's written from quite an interesting perspective, not quite "fly on the wall," but perhaps as close as you're going to get in a realistic scenario.
[1]: https://apenwarr.ca/log/20190926
"Surely I could do a mediocre job as a CxO by parroting whatever is hot on Linkedin"
Having worked for a pretty decent CIO of a global business I'd say his main job was to travel about speak to other senior leaders and work out what business problems they had and try and work out, at a very high level, how technology would fit into that addressing those problems.
Just parroting latest technology trends would, I suspect, get you sacked within a few weeks.
A charitable explanation for what CxOs do is that they figure out their strategic goals and then focus really hard on ways to herd cats en masse to achieve the goals in an efficient manner. Some people end up doing a great job, some do so accidentally, other just end up doing a job. Sometimes parroting some linkadink drivel is enough to keep the ship on course - usually because the winds are blowing in the right direction or the people at the oars are working well enough on their own.
Software engineers are pushed to their limits (and beyond). Unrealistic expectations are established by Twitter "I shipped an Uber clone in 2 hours with Claude" forcing every developer to crank out PRs, managers are on the look out for any kind of perceived inefficiency in tools like GetDX and Span.
If devs are expected to ship 10x faster (or else!), then they will find a way to ship 10x faster.
I always found it weird how most management would do almost anything other than ask their dev team "hey, is there any way to make you guys more productive?"
Ive had metrics rammed down my throat, Ive had AI rammed down my throat, Scrum rammed down my throad and Ive had various other diktats rammed down my throat.
95% of which slowed us down.
The only time ive been asked is when there is a deadline and it's pretty clear we arent going to hit it and even then they're interested in quick wins like "can we bring lunch to you for a few weeks?", not systemic changes.
The fastest and most productive times have been when management just set high level goals and stopped prodding.
Im convinced that the companies which seek developer autonomy will leave the ones which seek to maximize token usage in the dust in the next tech race.
Putting too much trust in an agent is definitely a problem, but I have to admit I've written about a dozen little apps in the past year without bothering to look at the code and they've all worked really well. They're all just toys and utilities I've needed and I've not put them into a production system, but I would if I had to.
Agents are getting really good, and if you're used to planning and designing up front you can get a ton of value from them. The main problem with them that I see today is people having that level of trust without giving the agent the context necessary to do a good job. Accepting a zero-shotted service to do something important into your production codebase is still a step too far, but it's an increasingly small step.
>> Putting too much trust in an agent is definitely a problem, but I have to admit I've written about a dozen little apps in the past year without bothering to look at the code and they've all worked really well. They're all just toys and utilities I've needed and I've not put them into a production system, but I would if I had to.
I have been doing this to, and I've forgotten half of them. For me the point is that this usage scenario is really good, but it also has no added value to it, really. The moment Claude Code raises it prices 2x this won't be viable anymore, and at the same time to scale this to enterprise software production levels you need to spend on an agent probably as much as hiring two SWEs, given that you need at least one to coordinate the agents.
Deepseek v3.2 tokens are $0.26/0.38 on OpenRouter. That model - released 4 months ago - isn't really good enough by today's standards, but its significantly stronger than Opus 4.1, which was only released last August! In 12 months I think its reasonable to expect there will be a model with less cost than that which is significantly stronger than anything available now.
And no, it isn't ONLY because VC capital is being burned to subsidize cost. That is impossible for the dozen smaller providers offering service at that cost on OpenRouter who have to compete with each other for every request and also have to pay compute bills.
Qwen3.5-9B is stronger than GPT-4o and it runs on my laptop. That isn't just benchmarks either. Models are getting smaller, cheaper and better at the same time and this is going to continue.
I think Claude could raise it's prices 100x and people would still use it. It'd just shift to being an enterprise-only option and companies would actually start to measure the value instead of being "Whee, AI is awesome! We're definitely going really fast now!"
1 reply →
I’m so disappointed to see the slip in quality by colleagues I think are better than that. People who used to post great PRs are now posting stuff with random unrelated changes, little structs and helpers all over the place that we already have in common modules etc :’(
> little structs and helpers all over the place that we already have in common modules
I've often wondered about building some kind of automated "this codebase already has this logic" linter
Not sure how it would actually work, otherwise I'd build it. But it would definitely be useful
Maybe an AI tool could do something like that nowadays. "Search this codebase for instances of duplicated functions and list them out" sort of thing
>this codebase already has this logic
At first glance this looks like it might be the halting problem in disguise (instead of the general function of the logic, just ask if they both have logic that halts or doesn't halt). I think we would need to allow for false negatives to even be theoretically possible, so while identical text comparison would be easy enough, anything past that can quickly becomes complicated and you can probably infinitely expand the complexity by handling more and more edge cases (but never every edge case due to the underlying halting problem/undecidability of code).
This is the part that doesn't get talked about enough. Code review was never just about catching bugs it was how teams built shared understanding of the codebase. When someone skips reviewing their own AI-generated PR, they're not just shipping unreviewed code, they're opting out of knowing what's in their own system. The trust problem isn't really about the AI output quality, it's about whether the person submitting it can answer questions about it six months from now.