← Back to context

Comment by pron

3 days ago

This future may come someday, but it's not here yet, not even with the post-late-'25 models. It assumes that agents can write reasonable code reliably, which they're currently far from doing (even 95% isn't enough, and I don't think we're at 95%).

Anthropic's C compiler experiment showed that even in a situation where people give the agent every imaginable advantage (above and beyond what they can reasonably do in most projects), i.e provide not only a very precise specificaton but also thousands of tests, a reference implementation to use as an oracle, and have the model trained on the reference implementaition - years of "preparation" effort - and all the agent has to do is just code, it still fails on a task that's certainly not trivial but also by no means monumental.

A lot of writing about agentic coding seems to assume that today's agent have coding down whereas the experience of anyone using them across different kinds of software work as well as tests by the labs themselves show that this is not yet true.