← Back to context

Comment by steveklabnik

3 days ago

Sorry, that’s not my take. I didn’t think these tools were useful until the latest set of models, that is, they crossed the threshold of usefulness to me.

Even then though, “technology gets better over time” shouldn’t be surprising, as it’s pretty common.

Do you really see a massive jump?

For context, I've been using AI, a mix of OpenAi + Claude, mainly for bashing out quick React stuff. For over a year now. Anything else it's generally rubbish and slower than working without. Though I still use it to rubber duck, so I'm still seeing the level of quality for backend.

I'd say they're only marginally better today than they were even 2 years ago.

Every time a new model comes out you get a bunch of people raving how great the new one is and I honestly can't really tell the difference. The only real difference is reasoning models actually slowed everything down, but now I see its reasoning. It's only useful because I often spot it leaving out important stuff from the final answer.

  • The massive jump in the last six months is that the new set of "reasoning" models got really good at reasoning about when to call tools, and were accompanied is by a flurry of tools-in-loop coding agents - Claude Code, OpenAI Codex, Cursor in Agent mode etc.

    An LLM that can test the code it is writing and then iterate to fix the bugs turns out to be a huge step forward from LLMs that just write code without trying to then exercise it.

  • I've gone from asking the tools how to do things, and cut and pasting the bits (often small) that'd be helpful, via using assistants that I'd review every decision of and often having to start over, to now often starting an assistant with broad permissions and just reviewing the diff later, after they've made the changes pass the test suite, run a linter and fixed all the issues it brought up, and written a draft commit message.

    The jump has been massive.

  • > but now I see its reasoning

    It's not showing its reasoning. "Reasoning" models are trained to output more tokens in the hope that more tokens means less hallucinations.

    It's just a marketing trick and there is no evidence this sort of fake ""reasoning"" actually gives any benefit.

  • Yes. In January I would have told you AI tools are bullshit. Today I’m on the $200/month Claude Max plan.

    As with anything, your miles may vary: I’m not here to tell anyone that thinks they still suck that their experience is invalid, but to me it’s been a pretty big swing.

    • > In January I would have told you AI tools are bullshit. Today I’m on the $200/month Claude Max plan.

      Same. For me the turning point was VS Code’s Copilot Agent mode in April. That changed everything about how I work, though it had a lot of drawbacks due to its glitches (many of these were fixed within 6 or so weeks).

      When Claude Sonnet 4 came out in May, I could immediately tell it was a step-function increase in capability. It was the first time an AI, faced with ambiguous and complicated situations, would be willing to answer a question with a definitive and confident “No”.

      After a few weeks, it became clear that VS Code’s interface and usage limits were becoming the bottleneck. I went to my boss, bullet points in hand, and easily got approval for the Claude Max $200 plan. Boom, another step-function increase.

      We’re living in an incredibly exciting time to be a skilled developer. I understand the need to stay skeptical and measure the real benefits, but I feel like a lot of people are getting caught up in the culture war aspect and are missing out on something truly wonderful.

    • Ok, I'll have to try it out then. I've got a side project I've 3/4 finished and will let it loose on it.

      So are you using Claude Code via the max plan, Cursor, or what?

      I think I'd definitely hit AI news exhaustion and was viewing people raving about this agentic stuff as yet more AI fanbois. I'd just continued using the AI separate as setting up a new IDE seemed like too much work for the fractional gains I'd been seeing.

      5 replies →

  • I see a massive jump every time.

    Just two years ago, this failed.

    > Me: What language is this: "esto está escrito en inglés"

    > LLM: English

    Gemini and Opus have solved questions that took me weeks to solve myself. And I'll feed some complex code into each new iteration and it will catch a race condition I missed even with testing and line by line scrutiny.

    Consider how many more years of experience you need as a software engineer to catch hard race conditions just from reading code than someone who couldn't do it after trying 100 times. We take it for granted already since we see it as "it caught it or it didn't", but these are massive jumps in capability.

Wait until the next set. You will find you the previous ones weren't useful after all.

  • This makes no sense to me. I’m well aware that I’m getting value today, that’s not going to change in the future: it’s already happened.

    Sure they may get even more useful in the future but that doesn’t change my present.