Comment by crubier
3 days ago
I get it, but as a "AI expert and senior leader" myself in my 1,000 people organization (in relative terms), the disconnect I have is:
A lot of what non-believers say matches "enthusiasm on the ground is lacking as results rarely live up to the extremely rosy promises". They would then say they need 2 weeks to work on a specific project, the good old way, maybe with some light AI use along the way.
But then I'm like "hmm actually let me try this real quick" and I prompt Claude for 3 minutes, and 30 minutes later it has one-shotted the whole "two weeks project". It then gets reviewed and merged by the "non-believers". This happens repeatedly.
So overall, I think the lack of enthusiasm is largely a skill issue. Not having the skill is fine, but not being willing to learn the skill is the real issue.
I see things changing, as "non-believers" eventually start to realize that they need to evolve or be toast. But it's slower than I imagined.
I am a strong believer and selected as power user because of AI usage metrics, but I also see perverse incentives -- a colleague was desperately searching for me on the Claude token usage leaderboard (I was part of a different group he did not have access to) -- it was clear he was actively trying to climb that leaderboard.
Meanehile our average PR loc balooned to ~2000loc -- generated with Claude, reviewed with copilot but colleagues also review it with Claude because it gives valid nitpicks that bump up your github stats, while missing glaring functional/architectural issues, overenginerring issues.
No way this doesn't blow up down the road with the massive bloat we're creating while getting high on the "good progress" we're making.
Yes, your 3 minutes prompt got merged. So was my friends(ex-programmer now manager) non-ai generated PR that a technical TL got stuckstuck on for 2 weeks. Different perspective? Survivor bias? High authority?
Blame your engineering culture not AI if metrics such as Github stats, number of nitpick reviews and token usage is what is used to judge one's performance.
In a sane engineering culture, actual customer-visible impact is what is measured, and AI is just a tool to improve that metric, but to improve it massively.
this is a nice anecdote but i think the real issue is the forcing and kpi-nization of llms top-down for nearly everything
there are still code-quality issues, prompting issues for long-running tasks, some things are just faster and more deterministic with normal code generators or just find-and-replace etc
people are annoyed at the force-feeding of llms/ai into everything even when its not needed
somethings can be one-shotted and some things cant, and that is fine and perfectly normal but execs don't like that because its not the new hotness
> somethings can be one-shotted and some things cant
True but my point is that people vastly underestimate what is one-shottable.
In my experience, 80% of the times an average "non-believer" SW engineer with 7 years experience says something is not one-shottable, I, with my 15 years of experience, think it is fact one-shottable. And 20% of the time, I do verify that by one-shotting it on my free time.
I believe that this has happened in some cases but am very skeptical that it is widespread and generalizeable at this point. My own experience is that software engineers thinking they can easily solve a problem in a domain they know nothing about overrate their ability to do so ~99% of the time.
I'm not talking about coding in domains I know nothing about. I'm talking about coding in domains I've worked in for 15 years
Well "non-believers" don't see any gain from being faster, right? That'll just set expectations of "do a lot more for same". Fear of being "toast" will get you the loyalty you'd expect from fear.
Are you European by any chance? I left Europe to avoid your mentality
the best way I found to deal with non-believers is to have claude run code reviews on their own work. I’ll point it to an older commit and get like 3-page markdown file :) works really, really well.
on one-shotting 3 minute prompt in 30 minutes though, software is a living organism and early gains can (and often result) in later pains. I do not use this type of argument as it relates to AI as the follow-up as the organism spreads its wings to production seldom makes its way to HN (if this 30 minute one-shot results in a huge security breach I doubt you would be back here with a follow-up, you will quietly handle it…)
You can get it to generate a 3-page markdown file for any random code, or its own code it just generated. If requested it will produce a seemingly plausible looking review with recommendations and possible issues.
How impressed someone get from that will depend on the recipient.
output, not recipient. try it on your own code. not everything on the example 3-page markdown you'll agree (much like you push back on the PR) but in significant number of occasions code changes were made based on the provided output
1 reply →
Unsure of this really tracks tho. How are you evaluating for the bias that they’re not merging it because you’re “their leader of 1000 people org” and not because you’re actually an engineer deep in the trenches that knows the second or third order effects of slop?
This is a genuine question btw, I see plenty of instances of this in my own org.
I see your point, but
1. I am also on the receiving end of this. My boss often codes and vibecodes, and no one feels like they have to merge their stuff. We only merge it if it meets the high quality standard we have. And there is no drama for blocking a PR in our culture. 2. I am fairly deep in the trenches myself and I know when my PRs are high quality and when they are not. And that does not correlate with use of AI in my experience.
I've been on this ride about three or four times over decades. Every new major wave of technology takes a surprisingly long time to be adopted, despite advantages that seem obvious to the evangelists.
I had the exact same experience with, for example, rolling out fully virtualized infrastructure (VMware ESXi) when that was a new concept.
The resistance was just incredible!
"That's not secure!" was the most common push-back, despite all evidence being that VM-level isolation combined with VLANs was much better isolation than huge consolidated servers running dozens of apps.
"It's slower!" was another common complaint, pointing at the 20% overheads that were the norm at the time (before CPU hardware offload features such as nested page tables). Sure, sure, in benchmarks, but in practice putting a small VM on a big host meant that it inherited the fast network and fibre adapters and hence could burst far above the performance you'd get from a low end "pizza box" with a pair of mechanical drives in a RAID10.
I see the same kind of naive, uninformed push-back against AI. And that's from people that are at least aware of it. I regularly talk to developers that have never even heard of tools like Codex, Gemini CLI, or whatever! This just hasn't percolated through the wider industry to the level that it has in Silicon Valley.
Speaking of security, the scenarios are oddly similar. Sure, prompt injection is a thing, but modern LLMs are vastly "more secure" in a certain sense than traditional solutions.
Consider Data Loss Prevention (DLP) policy engines. Most use nothing more than simple regular expression patterns looking for things like credit card numbers, social security numbers, etc... Similarly, there are policy engines that look for swearwords, internal project code names being sent to third-parties, etc...
All of those are trivially bypassed even by accident! Simply screenshot a spreadsheet and attach the PNG. Swear at the customer in a language other than English. Put spaces in between the characters in each s w e a r word. Whatever.
None of those tricks work against a modern AI. Even if you very carefully phrase a hurtful statement while avoiding the banned word list, the AI will know that's hurtful and flag it. Even if you use an obscure language. Even if you embed it into a meme picture. It doesn't matter, it'll flag it!
This is a true step change in capability.
It'll take a while for people to be dragged into the future, kicking and screaming the whole way there.
Would you trust an LLM to recognize a credit card number more reliably than a regular expression can?
You're not forced to use only an LLM for data loss prevention! You can combine it with regex. You can also feed the output of the regex matches to the LLM as extra "context".
Similarly, I was just flipping through the SQL Server 2025 docs on vector indexes. One of their demos was a "hybrid" search that combined exact text match with semantic vector embedding proximity match.