← Back to context

Comment by tombert

1 day ago

Sure, but think about what it's replacing.

If you hired a human, it will cost you thousands a week. Humans will also fail at basic tasks, get stuck in useless loops, and you still have to pay them for all that time.

For that matter, even if I'm not hiring anyone, I will still get stuck on projects and burn through the finite number of hours I have on this planet trying to figure stuff out and being wrong for a lot of it.

It's not perfect yet, but these coding models, in my mind, have gotten pretty good if you're specific about the requirements, and even if it misfires fairly often, they can still be useful, even if they're not perfect.

I've made this analogy before, but to me they're like really eager-to-please interns; not necessarily perfect, and there's even a fairly high risk you'll have to redo a lot of their work, but they can still be useful.

I am an AI-skeptic but I would agree this looks impressive from certain angles, especially if you're an early startup (maybe) or you are very high up the chain and just want to focus on cutting costs. On the other hand, if you are about to be unemployed, this is less impressive. Can it replace a human? I would say no its still long way to go, but a good salesman can convince executives that it does and thats all that matters.

  • > On the other hand, if you are about to be unemployed, this is less impressive

    > salesman can convince executives that it does

    I tend to think that reality will temper this trend as the results develop. Replacing 10 engineers with one engineer using Cursor will result in a vast velocity hit. Replacing 5 engineers with 5 "agents" assigned to autonomously implement features will result in a mess eventually. (With current technology -- I have no idea what even 2027 AI will do). At that point those unemployed engineers will find their phones ringing off the hook to come and clean up the mess.

    Not that unlike what happens in many situations where they fire teams and offshore the whole thing to a team of average developers 180 degrees of longitude away who don't have any domain knowledge of the business or connections to the stakeholders. The pendulum swings back in the other direction.

  • I just think Jevins paradox [1]/Gustafson's Law [2] kind of applies here.

    Maybe I shouldn't have used the word "replaced", as I don't really think it's actually going to "replace" people long term. I think it's likely to just lead to higher output as these get better and better .

    [1] https://en.wikipedia.org/wiki/Jevons_paradox

    [2] https://en.wikipedia.org/wiki/Gustafson%27s_law

    • Not you, but the word replaced is the being used all the time. Even senior engineers are saying they are using it as a junior engineers while we can easily hire junior engineers (but Execs don't want to). Jevon's paradox wont work in Software because user's wallets and time is limited, and if software becomes too easy to build, it becomes harder to sell. Normal people can have 5 subscriptions, may be 10, but they wont be going to 50 or 100. I would say we would have already exhausted users already, with all the bad practices.

You’ve missed my point here - I agree that gen AI has changed everything and is useful, _but_ I disagree that it’s improved substantially - which is what the comment I replied to claimed.

Anecdotally I’ve seen no difference in model changes in the last year, but going from LLM to Claude code (where we told the LLMs they can use tools on our machines) was a game changer. The improvement there was the agent loop and the support for tools.

In 2023 I asked v0.dev to one shot me a website for a business I was working on and it did it in about 3 minutes. I feel like we’re still stuck there with the models.

  • My experience in 2024 AI tools like copilot was if the code compiled first time it was an above average result and I’d need a lot of manual tweaking.

    There were definitely languages where it worked better (JS), but if I told people here I had to spend a lot of time tweaking after it, at least half of them assumed I was being really anal about spacing or variable names, which was simply not the case.

    It’s still the case for cheaper models (GPT-mini remains a waste of my timetime), but there’s mid level models like Minimax M2 that can produce working code and stuff like Sonnet can produce usable code.

    I’m not sure the delta is enough for me that I’d pay for these tools on my own though…

  • I've been coding with LLMs for less than a year. As I mentioned to someone in email a few days ago: In the first half, when an LLM solved a problem differently from me, I would probe why and more often than not overrule and instruct it to do it my way.

    Now it's reversed. More often than not its method is better than mine (e.g. leveraging a better function/library than I would have).

    In general, it's writing idiomatic mode much more often. It's been many months since I had to correct it and tell it to be idiomatic.

  • In my experience it has gotten considerably better. When I get it to generate C, it often gets the pointer logic correct, which wasn't the case three years ago. Three years ago, ChatGPT would struggle with even fairly straightforward LaTeX, but now I can pretty easily get it to generate pretty elaborate LaTeX and I have even had good success generating LuaTeX. I've been able to fairly successfully have it generate TLA+ spec from existing code now, which didn't work even a year ago when I tried it.

    Of course, sample size of one, so if you haven't gotten those results then fair enough, but I've at least observed it getting a lot better.