← Back to context

Comment by simonw

3 days ago

The previous model retroactively becomes not as good as the best available models. I don't think that's a huge surprise.

The surprise is the implication that the crossover between net-negative and net-positive impact happened to be in the last 4 months, in light of the initial release 2 years ago and sufficient public attention for a study to be funded and completed.

Yes, it might make a difference, but it is a little tiresome that there's always a “this is based on a model that is x months old!” comment, because it will always be true: an academic study does not get funded, executed, written up, and published in less time.

  • Some of it is just that (probably different) people said the same damn things 6 months ago.

    "No, the 2.8 release is the first good one. It massively improves workflows"

    Then, 6 months later, the study comes out.

    "Ah man, 2.8 was useless, 3.0 really crossed the threshold on value add"

    At some point, you roll your eyes and assume it is just snake oil sales

    • There’s a lot of confounding factors here. For example, you could point to any of these things in the last ~8 months as being significant changes:

      * the release of agentic workflow tools

      * the release of MCPs

      * the release of new models, Claude 4 and Gemini 2.5 in particular

      * subagents

      * asynchronous agents

      All or any of these could have made for a big or small impact. For example, I’m big on agentic tools, skeptical of MCPs, and don’t think we yet understand subagents. That’s different from those who, for example, think MCPs are the future.

      > At some point, you roll your eyes and assume it is just snake oil sales

      No, you have to realize you’re talking to a population of people, and not necessarily the same person. Opinions are going to vary, they’re not literally the same person each time.

      There are surely snake oil salesman, but you can’t buy anything from me.

      1 reply →

    • Or you accept that different people have different skill levels, workflows and goals, and therefore the AIs reach usability at different times.

      1 reply →

That's not the argument being made though, which is that it does "work" now and implying that actually it didn't quite work before; except that that is the same thing the same people say for every model release, including at the time or release of the previous one, which is now acknowledged to be seriously flawed; and including the future one, at which time the current models will similarly be acknowledged to be, not only less performant that the future models, but inherently flawed.

Of course it's possible that at some point you get to a model that really works, irrespective of the history of false claims from the zealots, but it does mean you should take their comments with a grain of salt.

  • > That's not the argument being made though, which is that it does "work" now and implying that actually it didn't quite work before

    Right.

    > except that that is the same thing the same people say for every model release,

    I did not say that, no.

    I am sure you can find someone who is in a Groundhog Day about this, but it’s just simpler than that: as tools improve, more people find them useful than before. You’re not talking to the same people, you are talking to new people each time who now have had their threshold crossed.

    • > You’re not talking to the same people, you are talking to new people each time who now have had their threshold crossed.

      no, it's the same names, again and again

      3 replies →