Right now I think there is an edge to how you construct prompts and config files. There is a large difference between "modify f() to do..." and "modify f() to do... Review the current variables and make sure they are still used consistent with their naming. Look for unreachable and dead code. Examine callers and called functions for side effects from the introduced changes...".
I don't think that will make much difference in a year.
I'm increasingly convinced of the opposite. IMO Fable was pretty similarly capable for my day to day work as Opus.
I think there's a pretty good chance that we've reached the point of diminishing returns, for our specific use case.
There are still like a billion other (more difficult) use cases to be tackled, but I think "generating code" has gotten really good to the point where the other bottlenecks will prevent further exponential progress on this specific task.
It's already going away for me in a sense as I build up a library of AGENTS.md and Codex skills. I see no reason such things won't get baked in at the agent layer so that domain specific rules and such are automatically applied when appopriate.
Yeah, uh, why would it go away? In what world do you completely surrender your ability to control the work product, the methods for achieving said work product, etc. That is the dream of a PHB.
Not OP, but I generally agree. Models are powerful enough now to reliably instruct other models. They don’t need fancy tools or IDEs, just the command line.
With deterministic workflows, type-safe languages and test suites, agentic loops pretty much “can’t fail”. They will continue until the types resolve, the tests pass, and the project requirements are deterministically met.
By that point it’s literally just a case of typing a prompt in to a text field, and waiting.
It works great in dynamic languages as well. Static typing is mostly to aid the IDE. In dynamic languages you can infer the type by looking at the code, and LLM are good at that.
Right now I think there is an edge to how you construct prompts and config files. There is a large difference between "modify f() to do..." and "modify f() to do... Review the current variables and make sure they are still used consistent with their naming. Look for unreachable and dead code. Examine callers and called functions for side effects from the introduced changes...".
I don't think that will make much difference in a year.
I'm increasingly convinced of the opposite. IMO Fable was pretty similarly capable for my day to day work as Opus.
I think there's a pretty good chance that we've reached the point of diminishing returns, for our specific use case.
There are still like a billion other (more difficult) use cases to be tackled, but I think "generating code" has gotten really good to the point where the other bottlenecks will prevent further exponential progress on this specific task.
That's not going away.
It's already going away for me in a sense as I build up a library of AGENTS.md and Codex skills. I see no reason such things won't get baked in at the agent layer so that domain specific rules and such are automatically applied when appopriate.
3 replies →
Yeah, uh, why would it go away? In what world do you completely surrender your ability to control the work product, the methods for achieving said work product, etc. That is the dream of a PHB.
Who's to say it won't?
1 reply →
Not OP, but I generally agree. Models are powerful enough now to reliably instruct other models. They don’t need fancy tools or IDEs, just the command line.
With deterministic workflows, type-safe languages and test suites, agentic loops pretty much “can’t fail”. They will continue until the types resolve, the tests pass, and the project requirements are deterministically met.
By that point it’s literally just a case of typing a prompt in to a text field, and waiting.
"project requirements are deterministically met" makes it sound so easy
It works great in dynamic languages as well. Static typing is mostly to aid the IDE. In dynamic languages you can infer the type by looking at the code, and LLM are good at that.
This seems true to me in theory, but not in practice.
[dead]