← Back to context

Comment by cwillu

3 days ago

The surprise is the implication that the crossover between net-negative and net-positive impact happened to be in the last 4 months, in light of the initial release 2 years ago and sufficient public attention for a study to be funded and completed.

Yes, it might make a difference, but it is a little tiresome that there's always a “this is based on a model that is x months old!” comment, because it will always be true: an academic study does not get funded, executed, written up, and published in less time.

Some of it is just that (probably different) people said the same damn things 6 months ago.

"No, the 2.8 release is the first good one. It massively improves workflows"

Then, 6 months later, the study comes out.

"Ah man, 2.8 was useless, 3.0 really crossed the threshold on value add"

At some point, you roll your eyes and assume it is just snake oil sales

  • There’s a lot of confounding factors here. For example, you could point to any of these things in the last ~8 months as being significant changes:

    * the release of agentic workflow tools

    * the release of MCPs

    * the release of new models, Claude 4 and Gemini 2.5 in particular

    * subagents

    * asynchronous agents

    All or any of these could have made for a big or small impact. For example, I’m big on agentic tools, skeptical of MCPs, and don’t think we yet understand subagents. That’s different from those who, for example, think MCPs are the future.

    > At some point, you roll your eyes and assume it is just snake oil sales

    No, you have to realize you’re talking to a population of people, and not necessarily the same person. Opinions are going to vary, they’re not literally the same person each time.

    There are surely snake oil salesman, but you can’t buy anything from me.

    • > you have to realize you’re talking to a population of people, and not necessarily the same person. Opinions are going to vary, they’re not literally the same person each time.

      I pointed this out in my post for a reason. I get it. But even given a different person is saying the same thing every time a new release comes out - the effect on my prior is the same.

  • Or you accept that different people have different skill levels, workflows and goals, and therefore the AIs reach usability at different times.

    • The complication is that, as noted in the above paper, _people are bad at self-reporting on whether the magic robot works for them_. Just because someone _believes_ they are more effective using LLMs is not particularly strong evidence that they actually are.