Comment by dwohnitmok

5 days ago

> I don't know if any of this applies to the arguments in my article, but most of the point of it is that progress in code production from LLMs is not a consequence of better models (or fine tuning or whatever), but rather on a shift in how LLMs are used, in agent loops with access to ground truth about whether things compile and pass automatic acceptance.

I very strongly disagree with this and think this reflects a misunderstanding of model capabilities. This sort of agentic loop with access to ground truth model has been tried in one form or another ever since GPT-3 came out. For four years they didn't work. Models would very quickly veer into incoherence no matter what tooling you gave them.

Only in the last year or so have models gotten capable enough to maintain coherence over long enough time scales that these loops work. And future model releases will tighten up these loops even more and scale them out to longer time horizons.

This is all to say that progress in code production has been essentially driven by progress in model capabilities, and agent loops are a side effect of that rather than the main driving force.

Sure! Super happy to hear these kinds of objections because, while all the progress I'm personally perceiving is traceable to decisions different agent frameworks seem to be making, I'm totally open to the idea that model improvements have been instrumental in making these loops actually converge anywhere practical. I think near the core of my argument is simply the idea that we've crossed a threshold where current models plus these kinds of loops actually do work.