Comment by bryanrasmussen
10 hours ago
>I'm using Claude every day, and it definitely makes me faster but..
I see a lot of posts about this, and I see a lot studies, also on HN, that show that this isn't the case.
Now of the course the "this isn't the case" stuff is statistically, thus there can be individual developers whom are faster, but there can also be that an individual developer sometimes is faster and sometimes not but the times that they are faster are just so clearly faster that it sort of hides the times that they're not. Statistics of performance over a number of developers can flatten things out. But I don't know that is the case.
So my question for you, and everyone that claims it makes them so perceptively and clearly faster - how do you know? Given all the studies showing that it doesn't make you faster, how are you so sure it does?
It's incredibly frustrating arguing these same points, over and over, every time that this comes up. You're asking people who are experienced developers absolutely chewing through checklists and peeking at HN while compiling/procrastinating/eating a sandwich/waiting for a prompt to finish to not just explain but quantify what is plainly obvious to those people, every day. You want us to bring paper receipts, like we have some incentive to lie to you.
From our perspective, the gains are so obvious that it really does feel like you must just be doing something fundamentally wrong not to see the same wins.
So when someone says "I can't make it do the magic that you're seeing" it makes me wonder why you don't have a long list of projects that you've never gotten around to because life gets in the way.
Because... if you don't have that list, to us that translates as painfully incurious. It's inconceivable that you don't have such a list because just being a geek in this moment should be enough that you constantly notice things that you'd like to try. If you don't have that, it's like when someone tells you that they don't have an inner monologue. You don't love them any less, but it's very hard not to look at them a bit differently.
> It's incredibly frustrating arguing these same points, over and over, every time that this comes up. You're asking people who are experienced developers absolutely chewing through checklists and peeking at HN while compiling/procrastinating/eating a sandwich/waiting for a prompt to finish to not just explain but quantify what is plainly obvious to those people, every day. You want us to bring paper receipts, like we have some incentive to lie to you.
This puts what I have been feeling in the recent months into words pretty concisely!
To me, it really is a force multiplier: https://blog.kronis.dev/blog/i-blew-through-24-million-token... (though letting it run unconstrained/unsupervised is a mess, I generally like to make Claude Code create a plan and iterate on it with Opus 4.6, then fire off a review, since getting the Max subscription I don't really need Cerebras or other providers, though I still appreciate them)
At the same time I've seen people get really bad results with AI, often on smaller models, or just expecting to give it vague instructions and get good results, with no automated linters or prebuild checks in place, or just copying snippets with no further context in some random chat session.
Who knows, maybe there's a learning curve and a certain mindset that you need to have to get a benefit from the technology, to where like 80% of developers will see marginal gains or even detriment, which will show up in most of the current studies. A bit like how for a while architecturally microservices and serverless were all the rage and most people did an absolutely shit job at implementing them, before (hopefully) enough collective wisdom was gained of HOW to use the technology and when.
Totally! Though I maintain that the only good aspect to microservices is that krazam video. You know the one.
I do get frustrated when I see people not using Plan steps, copy/pasting from web front-ends or expecting to one-shot their entire codebase from a single dense prompt. It's problematic because it's not immediately obvious whether someone is still arguing like it's late 2024, you know what I mean?
Also, speaking for myself I can't recommend that anyone use anything but Opus 4.5 right now. 4.6 has a larger context window, but it's crazy expensive when that context window gets actually used even while most agree that these models get dumber when they have a super-large context. 4.5 actually scores slightly better than 4.6 on agentic development, too! But using less powerful models is literally using tools that are much more likely to produce the sorts of results that skeptics think apply across the board.
>It's incredibly frustrating arguing these same points, over and over,
quite frankly there seems to be something incredibly frustrating in your life going on, but I'm not sure that the underlying cause of whatever is weighing on your mind at the moment is that I asked "how do you know that what you are feeling is actually true, in comparison to what studies show should be true?" (rephrased, as not reasonable to quote whole post)
>From our perspective, the gains are so obvious that it really does feel like you must just be doing something fundamentally wrong not to see the same wins.
From my perspective, when I think i am experiencing something that data from multiple sources tell me is not what is actually happening I try to figure out how I can prove what I am experiencing, I reflect upon myself, have I somehow deluded myself? No? Then how do I prove it when analysis of many similar situations to my own show a different result?
You seem to think what I mean is people saying "Claude didn't help me, it wasn't worth it", no, just to clarify although I thought it was really clear, I am talking about numerous studies always being posted on HN so I'm sure you must have seen them where productivity gains from coding agents do not seem to actually show up in the work of those who use it. Studies conducted by third parties observing the work, not claims made by people performing the work.
I'm not going to go through the rest of your post, I get the urge to be insulting, especially as a stress release if you have a particularly bad time recently. But frankly, statistically speaking, my life is almost certainly significantly worse than yours, and for that reason, but not that reason alone, I will also quite confidently state without hardly any knowledge of you specifically but just my knowledge of my life and comparison of having met people throughout it, that my list dwarfs yours.
This takes the cake for one of the strangest replies I've ever received on here.
I'm not sure how or indeed why you draw lines from what I said to my life situation... which is relevant how?
What I apparently did not do a good enough job of conveying is that those "data from multiple sources" get cited and then people immediately reply with "those are old studies". It's circular in the same way that arguing with anti-vax people is circular.
The difference is that unlike vaccines, it's very easy for someone to see how productive they are when using LLMs properly. It's not a subtle difference.
Hence the frustration with people who keep insisting that we're imagining our own productivity. It's not a good faith inquiry.
I'm a principal engineer, been working on the same set of codebases for almost 10 years. I handle the 20% or so of my time that constitutes inbound faster than ever and I know because that inbound volume has clearly increased and yet I have, for the first time ever, begun chipping away at the "nice to have" backlog. My biggest time sink now is interviewing and code reviews -- the latter being directly proportional to the velocity increase across the teams I work with. Actually that's my biggest concern -- we are approaching a breaking point for code review volume.
Sorry I don't have DX stats or token usage stats I can share, but based on the directives from on high, those stats are highly correlated (in the positive).
[edit] And SEV rates are not meaningfully higher.
> everyone that claims it makes them so perceptively and clearly faster - how do you know?
For me, AI tools act like supercharged code search and auto complete. I have been able to make changes in complex components that I have rarely worked on. It saved me a week of effort to find the exact API calls that will do what I needed. The AI tool wrote the code and I only had to act as a reviewer. Of course I am familiar with the entire project and I knew the shape of the code to expect. But it saved me from digging out the exact details.
> For me, AI tools act like supercharged ... search and auto complete.
I think that is a fairly good definition of what an LLM is. I'd say the third leg of the definition is adjustable randomness.
> I see a lot of posts about this, and I see a lot studies, also on HN, that show that this isn't the case.
Most of these studies were done one or more years ago, and predate the deployment and adoption of RLHF-based systems like Claude. Add to that, the AI of today is likely as bad as it's ever going to be (i.e., it's only going to get better). Though I do think the 10x claims are probably unfounded.
I mean obviously things will always be a little bit behind that one reads about, so this is one of the claims I see sometimes about these studies is they are out of date, and if working with the new models they would find that wasn't the case. but then that is one of the continuing claims one also sees about LLMS, that the newest model fixes whatever issue one is complaining about. And then the claim gets reiterated.
The thing is when I use an AI I sort of feel these gains, but not any greatness, it's like wow it would have taken me days to write all this reasonable albeit sort of mediocre code. I mean that is definitely a productivity gain. Because a lot of times you need to write just mediocre code. But there are parts where I would not have written it like that. So if I go through fixing all these parts, how much of a gain did I actually get?
As most posters on HN I am a conceited jerk, so I can claim that I have worked with lots of mediocre programmers (while ignoring the points where I was mediocre by thinking oh that didn't count I followed the documentation and how it was suggested to use the API and that was a stupid thing to do) and I certainly didn't fix everything that they did, because there just wasn't enough hours in the day.
And they did build stuff that worked, much of the time, so now I got an automated version of that. sweet. But how do I quantify the productivity? Since there are claims put forth with statistical backing that the productivity is illusory.
This is just one of those things that tend to affect me badly, I think X is happening, study shows X does not happen. Am I drinking too much Kool-Aid here or is X really happening!!? How to prove it!!? It is the kind of theoretical, logical problem seemingly designed to drive me out of my gourd.