← Back to context

Comment by keeda

2 days ago

I mention a few here: https://news.ycombinator.com/item?id=45379452

> ... just looking at LOC or PRs, which of course is nonsense.

That's basically a variation of "How can they prove anything when we don't even know how to measure developer productivity?" ;-)

And the answer is the same: robust statistical methods! For instance, amongst other things they compare the same developers over time doing regular day-job tasks with the same quality control processes (review etc.) in place, before and after being allowed to use AI. It's like an A/B test. Spreading across a large N and time duration accounts for a lot of the day-to-day variation.

Note that they do not claim to measure individual or team productivity, but they do find a large, statistically significant difference in the data. Worth reading the methodologies to assuage any doubts.

> A Stanford case study found that after accounting for buggy code that needed to be re-worked there may be no productivity uplift.

I'm not sure if we're talking about the same Stanford study, the one in the link above (100K engineers across 600+ companies) does account for "code churn" (ostensibly fixing AI bugs) and still find an overall productivity boost in the 5 - 30% range. This depends a LOT on the use-case (e.g. complex tasks on legacy COBOL codebases actually see negative impact.)

In any case, most of these studies seem to agree on a 15 - 30% boost.

Note these are mostly from the ~2024 timeframe using the models from then without today's agentic coding harness. I would bet the number is much higher these days. More recent reports from sources like DX find upto a 60% increase in throughput, though I haven't looked closely at this and have some doubts.

> Meta measured a 6-12% uplift in productivity from adopting agentic coding. Thats paltry.

Even assuming a lower-end of 6% lift, at Meta SWE salaries that is a LOT of savings.

However, I haven't come across anything from Meta yet, could you link a source?

I guess it all comes down to what a meaningful gain is. I agree that 10-30% is meaningful and if “software is a gas” this will lead to more software. But my expectations had become anchored to the frontier labs marketing (10x), and in that context the data was telling me that LLMs are a good productivity tool rather than a disruptor of human labor.

BTW thanks for the links to the studies

  • Yeah unfortunately the hype is overwhelming and it needs real work to figure out what the real impact is. At this point the gains are modest but still compelling.

    On the other hand, we are still going through a period of discovering how to effectively use AI in all kinds of work, so the long-term impact is hard to extrapolate at this point. Fully AI-native workflows may look every different from what we are used to.

    Looking at something like the Beads and Gas Town repos, which are apparently fully vibe-coded, is instructive because the workflow is very different... but the volume of (apparently very useful) code produced there by mostly one dude with Claude is insane.

    As such, I can also see how this can become a significant disruptor of human labor. As the parent of a teen who's into software engineering, I am actually a bit concerned for his immediate future.

I don’t work in SWE so I am just reacting to the claims that LLMs 10x productivity and are leading to mass layoff in the industry. In that context the 6-12% productivity gain at a company “all in” on AI didn’t seem impressive. LMMs can be amazing tools, but I still don’t think these studies back up the claims being made by frontier labs.

And I think the 6-12% measure reports is from a 2025 not 2024 study?