Comment by losvedir

2 days ago

They're pointing out that the "agent" is just 100 lines of code with a single tool. That means the model itself has improved, since such a bare bones agent is little more than invoking the model in a loop.

That doesn't make sense, considering that the idea of an "agentic workflow" is essentially to invoke the model in a loop. It could probably be done in much less than 100 lines.

This doesn't refute the fact that this simple idea can be very useful. Especially since the utility doesn't come from invoking the model in a loop, but from integrating it with external tools and APIs, all of which requires much more code.

We've known for a long time that feeding the model with high quality contextual data can improve its performance. This is essentially what "reasoning" is. So it's no surprise that doing that repeatedly from external and accurate sources would do the same thing.

In order to back up GP's claim, they should compare models from a few years ago with modern non-reasoning models in a non-agentic workflow. Which, again, I'm not saying they haven't improved, but that the improvements have been much more marginal than before. It's surprising how many discussions derail because the person chose to argue against a point that wasn't being made.

  • The original point was that the previous SotA was a "heavily harnessed" agent, which I took to mean it had more tools at its disposal and perhaps some code to manage context and so on. The fact that the model can do it now in just 100 LoC and a terminal tool implied the model itself has improved. It's gotten better at standard terminal commands at least, and possibly bigger context window or more effectively using the data in its context window.

    Those are improvements to the model, albeit in service of agentic workflows. I consider that distinct from improvements to agents themselves which are things like MCP, context management, etc.