← Back to context

Comment by davnicwil

21 hours ago

I would say while LLMs do improve productivity sometimes, I have to say I flatly cannot believe a claim (at least without direct demonstration or evidence) that one person is doing the work of 20 with them in december 2025 at least.

I mean from the off, people were claiming 10x probably mostly because it's a nice round number, but those claims quickly fell out of the mainstream as people realised it's just not that big a multiplier in practice in the real world.

I don't think we're seeing this in the market, anywhere. Something like 1 engineer doing the job of 20, what you're talking about is basically whole departments at mid sized companies compressing to one person. Think about that, that has implications for all the additional management staff on top of the 20 engineers too.

It'd either be a complete restructure and rethink of the way software orgs work, or we'd be seeing just incredible, crazy deltas in output of software companies this year of the type that couldn't be ignored, they'd be impossible to not notice.

This is just plainly not happening. Look, if it happens, it happens, 26, 27, 28 or 38. It'll be a cool and interesting new world if it does. But it's just... not happened or happening in 25.

  > one person is doing the work of 20 with them in december 2025 at least

it reminds me of oop hype from the 90's, but maybe indeed it will eventually be true this time...?

It's entirely dependent on the type of code being written. For verbose, straightforward code with clear cut test scenarios, one agent can easily 24/7 the work of 20 FT engineers. This is a best case scenario.

Your productivity boost will depend entirely on a combination of how much you can remove yourself from the loop (basically, the cost of validation per turn) and how amenable the task/your code is to agents (which determines your P(success)).

Low P(success) isn't a problem if there's no engineer time cost to validation, the agent can just grind the problem out in the background, and obviously if P(success) is high the cost of validation isn't a big deal. The productivity killer is when P(success) is low and the cost of validation is high, these circumstances can push you into the red with agents very quickly.

Thus the key to agents being a force multiplier is to focus on reducing validation costs, increasing P(success) and developing intuition relating to when to back off on pulling the slot machine in favor of more research. This is assuming you're speccing out what you're building so the agent doesn't make poor architectural/algorithmic choices that hamstring you down the line.

  • > It's entirely dependent on the type of code being written. For verbose, straightforward code with clear cut test scenarios, one agent can easily 24/7 the work of 20 FT engineers. This is a best case scenario.

    So the "verbose, straightforward code with clear cut test scenarios" is already written by a human?

  • Respectfully, if I may offer constructive criticism, I’d hope this isn’t how you communicate to software developers, customers, prospects, or fellow entrepreneurs.

    To be direct, this reads like a fluff comment written by AI with an emphasis on probability and metrics. P(that) || that.

    I’ve written software used by a local real estate company to the Mars Perseverance rover. AI is a phenomenally useful tool. But be weary of preposterous claims.

    • I'll take you at your word regarding respectfully. That was an off the cuff attempt to explain the real levers that control the viability of agents under particular circumstances. The target market wasn't your average business potato but someone who might care about a hand waived "order approximate" estimator kind of like big-O notation, which is equally hand waivey.

      Given that, if you want to revisit your comment in a constructive way rather than doing an empty drive by, I'll read your words with an open mind.

I would say it varies from 0x to a modest 2x. It can help you write good code quickly, but, I only spent about 20-30% of my time writing code anyway before AI. It definitely makes debugging and research tasks much easier as well. I would confidently say my job as a senior dev has gotten a lot easier and less stressful as a result of these tools.

One other thing I have seen however is the 0x case, where you have given too much control to the llm, it codes both you and itself into pan’s labyrinth, and you end up having to take a weed wacker to the whole project or start from scratch.

  • Ok, if you're a senior dev, have you 'caught' it yet?

    Ask it a question about something you know well, and it'll give you garbage code that it's obviously copied from an answer on SO from 10 years ago.

    When you ask it for research, it's still giving you garbage out of date information it copied from SO 10 years ago, you just don't know it's garbage.

    • That's why you dont use LLMs as a knowledge source without giving them tools.

      "Agents use tools in a loop to achieve a goal."

      If you don't give any tools, you get hallucinations and half-truths.

      But you give one a tool to do, say, web searches and it's going to be a lot smarter. That's where 90% of the innovation with "AI" today is coming from. The raw models aren't gettin that much smarter anymore, but the scaffolding and frameworks around them are.

      Tools are the main reason Claude Code is as good as it is compared to the competition.

      1 reply →

    • Of course, step one is always critically think and evaluate for bad information. I think for research, I mainly use it for things that are testable/verifiable, for example I used it for a tricky proxy chain set up. I did try to use it to learn a language a few months ago which I think was counter productive for the reasons you mentioned.

    • I use web search (DDG) and I don’t think I have ever try more than one queries in the vast majority of cases. Why because I know where the answer is, I’m using the search engine as an index to where I can find it. Like “csv python” to find that page in the doc.

> I mean from the off, people were claiming 10x probably mostly because it's a nice round number,

Purely anecdotal, but I've seen that level of productivity from the vibe tools we have in my workplace.

The main issue is that 1 engineer needs to have the skills of those 20 engineers so they can see where the vibe coding has gone wrong. Without that it falls apart.

Could be speed/efficiency was the wrong dimension to optimize for and its leading the industry down a bad path.

An LLM helps most with surface area. It expands the breadth of possibilities a developer can operate on.