Comment by imiric

2 months ago

The idea that this technology isn't useful is as ignorant as thinking that there is no "AI" bubble.

Of course there is a bubble. We can see it whenever these companies tell us this tech is going to cure diseases, end world hunger, and bring global prosperity; whenever they tell us it's "thinking", can "learn skills", or is "intelligent", for that matter. Companies will absolutely devalue and the market will crash when the public stops buying the snake oil they're being sold.

But at the same time, a probabilistic pattern recognition and generation model can indeed be very useful in many industries. Many of our problems can be approached by framing them in terms of statistics, and throwing data and compute at them.

So now that we've established that, and we're reaching diminishing returns of scaling up, the only logical path forward is to do some classical engineering work, which has been neglected for the past 5+ years. This is why we're seeing the bulk of gains from things like MCP and, now, "agents".

8 comments

imiric

NitpickLawyer 2 months ago

> This is why we're seeing the bulk of gains from things like MCP and, now, "agents".

This is objectively not true. The models have improved a ton (with data from "tools" and "agentic loops", but it's still the models that become more capable).

Check out [1] a 100 LoC "LLM in a loop with just terminal access", it is now above last year's heavily harnessed SotA.

> Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent!

[1] - https://github.com/SWE-agent/mini-swe-agent

imiric 2 months ago
I don't understand. You're highlighting a project that implements an "agent" as a counterargument to my claim that the bulk of improvements are from "agents"?
Sure, the models themselves have improved, but not by the same margins from a couple of years ago. E.g. the jump from GPT-3 to GPT-4 was far greater than the jump from GPT-4 to GPT-5. Currently we're seeing moderate improvements between each release, with "agents" taking up center stage. Only corporations like Google are still able to squeeze value out of hyperscale, while everyone else is more focused on engineering.
- losvedir 2 months ago
  
  They're pointing out that the "agent" is just 100 lines of code with a single tool. That means the model itself has improved, since such a bare bones agent is little more than invoking the model in a loop.
  
  2 replies →
- IanCal 2 months ago
  
  I think the point here is that it’s not adding agents on top but the improvements in the models allow the agentic flow.
  
  2 replies →