Comment by simonw

3 days ago

The previous model retroactively becomes not as good as the best available models. I don't think that's a huge surprise.

12 comments

simonw

cwillu 3 days ago

The surprise is the implication that the crossover between net-negative and net-positive impact happened to be in the last 4 months, in light of the initial release 2 years ago and sufficient public attention for a study to be funded and completed.

Yes, it might make a difference, but it is a little tiresome that there's always a “this is based on a model that is x months old!” comment, because it will always be true: an academic study does not get funded, executed, written up, and published in less time.

Ntrails 3 days ago
Some of it is just that (probably different) people said the same damn things 6 months ago.
"No, the 2.8 release is the first good one. It massively improves workflows"
Then, 6 months later, the study comes out.
"Ah man, 2.8 was useless, 3.0 really crossed the threshold on value add"
At some point, you roll your eyes and assume it is just snake oil sales
- steveklabnik 3 days ago
  
  There’s a lot of confounding factors here. For example, you could point to any of these things in the last ~8 months as being significant changes:
  * the release of agentic workflow tools
  * the release of MCPs
  * the release of new models, Claude 4 and Gemini 2.5 in particular
  * subagents
  * asynchronous agents
  All or any of these could have made for a big or small impact. For example, I’m big on agentic tools, skeptical of MCPs, and don’t think we yet understand subagents. That’s different from those who, for example, think MCPs are the future.
  > At some point, you roll your eyes and assume it is just snake oil sales
  No, you have to realize you’re talking to a population of people, and not necessarily the same person. Opinions are going to vary, they’re not literally the same person each time.
  There are surely snake oil salesman, but you can’t buy anything from me.
  
  1 reply →
- Filligree 3 days ago
  
  Or you accept that different people have different skill levels, workflows and goals, and therefore the AIs reach usability at different times.
  
  1 reply →

foobarqux 3 days ago

That's not the argument being made though, which is that it does "work" now and implying that actually it didn't quite work before; except that that is the same thing the same people say for every model release, including at the time or release of the previous one, which is now acknowledged to be seriously flawed; and including the future one, at which time the current models will similarly be acknowledged to be, not only less performant that the future models, but inherently flawed.

Of course it's possible that at some point you get to a model that really works, irrespective of the history of false claims from the zealots, but it does mean you should take their comments with a grain of salt.

steveklabnik 3 days ago
> That's not the argument being made though, which is that it does "work" now and implying that actually it didn't quite work before
Right.
> except that that is the same thing the same people say for every model release,
I did not say that, no.
I am sure you can find someone who is in a Groundhog Day about this, but it’s just simpler than that: as tools improve, more people find them useful than before. You’re not talking to the same people, you are talking to new people each time who now have had their threshold crossed.
- blibble 3 days ago
  
  > You’re not talking to the same people, you are talking to new people each time who now have had their threshold crossed.
  no, it's the same names, again and again
  
  3 replies →