Comment by afro88

4 days ago

Great post, and sums up my recent experience with Cursor. There has been a jump in effectiveness that only happened recently, that is articulated well very late in the post:

> The answer is a critical chunk of the work for making agents useful is in the training process of the underlying models. The LLMs of 2023 could not drive agents, the LLMs of 2025 are optimized for it. Models have to robustly call the tools they are given and make good use of them. We are only now starting to see frontier models that are good at this. And while our goal is to eventually work entirely with open models, the open models are trailing the frontier models in our tool calling evals. We are confident the story will change in six months, but for now, useful repeated tool calling is a new feature for the underlying models.

So yes, a software engineering agent is a simple for-loop. But it can only be a simple for-loop because the models have been trained really well for tool use.

In my experience Gemini Pro 2.5 was the first to show promise here. Claude Sonnet / Opus 4 are both a jump up in quality here though. Very rare that tool use fails, and even rarer that it can't resolve the issue on the next loop.

0 comments

afro88

No comments yet

Contribute on Hacker News ↗