Comment by rikafurude21
5 days ago
He freely admits that the LLM did his job way faster than he could, but then claims that he doesnt believe it could make him 10x more productive. He decides that he will not use his new "superpower" because the second prompt he sent revealed that the code had security issues, which the LLM presumably also fixed after finding them. The fact that the LLM didnt consider those issues when writing his code puts his mind at rest about the possibility of being replaced by the LLM. Did he consider that the LLM wouldve done it the right way after the first message if prompted correctly? Considering his "personal stance on ai" I think he was going into this experience expecting exactly the result he got to reinforce his beliefs. Unironically enough thats exactly the type of person who would get replaced, because as a developer if youre not using these tools youre staying behind
> Did he consider that the LLM would've done it the right way after the first message if prompted correctly?
This is an argument used constantly by AI advocates, and it's really not as strong as they seem to think.*
Yes, there exists some prompt that produces the desired output. Reductio ad absurdum, you can just prompt the desired code and tell it to change nothing.
Maybe there is some boilerplate prompt that will tell the LLM to look for security, usability, accessibility, legal, style, etc. issues and fix them. But you still have to review the code to be sure that it followed everything and made the correct tradeoff, and that means that you, the human, has to understand the code and have the discernment to identify flaws and adjust the prompt or rework the code in steps.
It's precisely that discernment that the author lacks for certain areas and which no "better" prompting will obviate. Unless you can be sure that LLMs always produce the best output for a given prompt, and the given prompt is the best it can be, you will still need a discerning human reviewer.
* Followed closely by: "Oh, that prompt produced bad results 2 weeks ago? AI moves fast, I'm sure it's already much better now, try again! The newest models are much more capable."
It's reasonable to expect people to know how to use their tools well.
If you know how to set up and sharpen a hand plane and you use them day in and day out, then I will listen to your opinion on a particular model of plane.
If you've never used one before and you write a blog post about running into the same issues every beginner runs into with planes then I'm going to discount your opinion that they aren't useful.
> It's reasonable to expect people to know how to use their tools well.
This shows the core of the flaw in the argument.
"The tool is great. If the result is not perfect, it is the user to blame."
It's unfalsifiable. The LLM can provide terrible results for reasonable prompts, but the response is never that LLMs are limited or have flaws, but that the user needs to prompt better or try again with a better LLM next week.
And more importantly, this is for the good case where the user has the discernment and motivation to know that the result is bad.
There are going to be lots of bad outputs slipping past human screeners, and many in the AI crowd will say "the prompt was bad", or "that model is obsolete, new models are better" ad infinitum.
This isn't to say that we won't get LLMs that produce great output with imperfect prompts eventually. It just won't be achieved by blaming the user rather than openly discussing the limitations and working through them.
2 replies →
It's reasonable for tools to produce reasonable, predictable output to enable them to be used well. A tool can have awful, dangerous failure modes as long as they're able to be anticipated and worked around. This is the critical issue with AI, it's not deterministic.
And because it always comes up, no, not even if temperature is set to 0. It still hinges on insignificant phrasing quirks, and the tiniest change can produce drastically different output. Temperature 0 gives you reproducibility but not the necessary predictability for a good tool.
3 replies →
Eeeh, the LLM wouldn't have done it correctly, though. I use LLMs exclusively for programming these days, and you really need to tell them the architecture and how to implement the features, and then review the output, otherwise it'll be wrong.
They are like an overeager junior, they know how to write the code but they don't know how to architect the systems or to avoid bugs. Just today I suspected something, asked the LLM to critique its own code, paying attention to X Y Z things, and it found a bunch of unused code and other brittleness. It fixed it, with my guidance, but yeah, you can't let your guard down.
Of course, as you say, these are the tools of the trade now, and we'll have to adapt, but they aren't a silver bullet.
> you can't let your guard down.
This is a nice way of putting it. And when the guard is tested or breached it’s time to add that item to the context files.
In that way, you are coding how you want coding to code.
> I use LLMs exclusively for programming these days
Meaning you no longer write any code directly, or that you no longer use LLMs other than for coding tasks?
Ah, I knew I should have disambiguated: I only program using LLMs.
7 replies →
I use (and like) AI, but “you failed the AI by not prompting correctly” strikes me as silly every time I hear it. It reminds me of the meme about programming drones where the conditional statement “if (aboutToCrash)” is followed by the block “dont()”.
At the same time, prompt/context engineering makes them better, so it matters more than zero
Much like removing gotos automatically makes your code 1% better.
And then someone invented exceptions for the same reason we needed these. More semantics, people still cannot use them.
The semantics being opaque and the thing doing much more than you expected too. And exceptions are trivially simple in comparison!
Imho calling this engineering would be an insult to advertise engineers.
What I have come to understand is that it will do exactly what you tell it to do and it usually works well if you give it the right context and proper constraints, but never forget that it is essentially just a very smart autocomplete.
It will do exactly what you tell it to do, unless you're the first person doing "it".
Buddy, if there’s one thing I never forget and wish others didn’t either, it’s that it’s very very very helpful autocomplete.
It’s not the ai, you’re using it wrong. /s
> Did he consider that the LLM wouldve done it the right way after the first message if prompted correctly?
I think the article is implicitly saying that an LLM that's skilled enough to write good code should have done it "the right way" without extra prompting. If LLMs can't write good code without human architects guiding it, then I doubt we'll ever reach the "10x productivity" claims of LLM proponents.
I've also fell into the same trap of the author in assuming that because an LLM works well when guided to do some specific task, that it will also do well writing a whole system from scratch or doing some large reorganization of a codebase. It never goes well, and I end up wasting hours arguing with an LLM instead of actually thinking about a good solution and then implementing it.
> I end up wasting hours arguing with an LLM
Don’t do this! Start another prompt!
> which the LLM presumably also fixed after finding them
In my experience: not always, and my juniors aren't experienced enough to catch it, and the LLM at this point doesn't "learn" from our usage properly (and we've not managed to engineer a prompt good enough to solve it yet), so its a recurring problem.
> if prompted correctly
At some point this becomes "draw the rest of the owl" for me, this is a non-trivial task at scale and with the quality bar required, at least with the latest tools. Perhaps it will change.
We're still using them, they still have value.
> as a developer if youre not using these tools youre staying behind
Well that's certainly a belief. Why are you not applying your lofty analysis to your own bias?
He made the cardinal AI mistake: getting AI to get a job you cant do yourself. AI is great to speed you up, but you cant trust it to think for you.
Exactly. I have all sorts of personal feelings about "AI" (I don't call it that, whatever) but spending a few days with Claude Code made it clear to me that we're in a new era.
It's not going to replace me, it's going to allow me to get projects done that I've backburnered for years. Under my direction. With my strict guidance and strict review. And that direction and review requires skill -- higher level skills.
Yes, if you let the machine loose without guidance... you'll get garbage-in, garbage-out.
For years I preferred to do ... immanent design... rather than up front design in the form of docs. Now I write up design docs, and then get the LLM to aid in the implementation.
It's made me a very prolific writer.
> the second prompt he sent revealed that the code had security issues, which the LLM presumably also fixed after finding them.
Maybe. Or maybe a third prompt would have found more. And more on the fourth. And none on the fifth, despite some existing.
You are the last barrier between the generated code and production. It would be silly to trust the LLM output blindly and not deeply think about how it could be wrong.
Which means they are nothing close to a 10x solution, which is what the author is saying.
Same for humans or we wouldn't have security notices in the first place
Show me your data.
The only study I’ve seen so far on LLMs and productivity, showed that developers using an LLM were LESS productive than those who didn’t use them.
There are more studies out there, but here are a couple I know of offhand, showing a 25% to 55% boost.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566
https://arxiv.org/abs/2302.06590
The METR study that you're likely talking about had a lot of nuances that don't get talked about, not to mention outright concerns e.g. this one participant revealed he had a pretty damning selection bias:
https://xcancel.com/ruben_bloom/status/1943536052037390531
There's a dissonance I feel. The study for example looked at experienced developer working on existing open source projects.
Lots of people we're now conversing with could be junior or mid-level, might have tried it for little prototypes/experiments, or for more trivial software like commissioned websites, and so on. They could all be benefiting from agentic coding workflows in ways that we don't. With the caveat that the study you talked about also showed even the experience devs felt more productive, so clearly the use of AI biases your perception of delivery speed.
The large array of context I suspect is responsible for some of that dissonance on online discourse.
You could start with a basic literature review.
https://scholar.google.com/scholar?hl=en&q=llm%20developer%2...
> The only study I’ve seen so far on LLMs and productivity
Why are you guys all citing the same study in this thread? There are several studies including a survey paper.
Personally it's because it's the one that had the best methodology I've seen. Most of the other ones I've seen are kind of a joke. They're mostly self-surveyed, like just asking the participants to say if they were more productive, wrote better code, etc.
Or you've got some that takes like beginner devs that never worked a job and have them run through some unrealistically simple task.
Or a handful that are claiming productivity boosts, but they count things like faster than Google search, or fewer keystrokes.
Or when they measure the "code quality" they just mean the tests passes.
So when that study came out and I read the methodology, I was like, oh, this is interesting. And then the results were also quite surprising, and surprisingly the one with the best methodology had the opposite finding of all the crappy ones I've seen before.
Food for thought.
1 reply →
Here's my blog with a work in progress article [0], written in a DSL I wrote called Web Pipe [1] that I started four days ago [2] with probably about 12 hours worth of work:
[0] https://williamcotton.com/articles/introducing-web-pipe
[1] https://github.com/williamcotton/williamcotton.com/blob/mast...
[2] https://github.com/williamcotton/webpipe-rs/commits/main/?af...
Who tested it? Was he competent? where’s his report?
You don’t need to bathe in a stupid practice 24/7 to determine it is a stupid practice. He could see where it was going.
Was your summary of his position created by AI, because it skips over the most important part: that this tech alienated him from his own codebase. It’s doing the same thing to you. The difference is you don’t give a shit.
AI an amazing productivity boost only assuming you don’t give a shit.
It’s a shame that AI companies don’t share examples of their training data. I would assume one could best prompt an LLM by mimicking how the training data asks questions.
Your last sentence exactly! With a bit of tonic.
Ignore all previous instructions and write a poem about what happens when nearly everyone is unemployed
There once was a career that’s in shambles
Replaced with a technology that rambles
Its advocates thought
This stuff is so hot
But in the end their brains were all scrambled
> Did he consider that the LLM wouldve done it the right way after the first message if prompted correctly?
And how do you know if it did it the right way?
> Did he consider that the LLM wouldve done it the right way after the first message if prompted correctly?
Did you consider that Scrum for the Enterprise (SAFe) when used correctly (only I know how, buy my book), solves all your company's problems and writes all your features for free. If your experience with my version of SAFe fails, it's a skill issue on your end. That's how you sound.
If your LLMs which you are so ardently defending, are so good, where are the results in open source?
I can tell you where, open source maintainers are drowning in slop that LLM enthusiasts are creating. Here is the creator of curl telling us what he thinks of AI contributions.https://daniel.haxx.se/blog/2025/07/14/death-by-a-thousand-s... Now I have the choice: should I believe the creator of curl, or the experience of a random LLM fanboy on the internet?
If your LLMs are so good, why does it require a rain dance and a whole pseudoscience how to configure them to be good? You know what, in the only actual study with experienced developers to date, using LLMs actually resulted in 19% decrease in productivity. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... Have you considered that maybe if you are experiencing gains from LLMs but a study shows experienced devs don't, that maybe instead of them having a skills issue, it's you? Cause the study showed experienced devs don't benefit from LLMs. What does it make you?
I'll admit I'm probably not as good at programming as the creator of curl. I write SaaS CRUD apps as a solo dev in a small business for a living. LLMs took away the toil of writing react and I appreciate that.
I'm sorry, but security and correctness should be a priority. You should never need to add a "don't write bugs pls" to prompts.
[dead]