Comment by simonw

4 days ago

The massive jump in the last six months is that the new set of "reasoning" models got really good at reasoning about when to call tools, and were accompanied is by a flurry of tools-in-loop coding agents - Claude Code, OpenAI Codex, Cursor in Agent mode etc.

An LLM that can test the code it is writing and then iterate to fix the bugs turns out to be a huge step forward from LLMs that just write code without trying to then exercise it.