Comment by amluto

2 months ago

If I’m coordinating a large codebase, I expect the people I’m coordinating to be capable of learning and improving over time. Coding agents cannot (currently) do this.

I wonder if a very lightweight RL loop built around the user could work well enough to help the situation. As I understand it, current LLMs generally do not learn at a rate such that one single bad RL example and one (prompted?) better example could result in improvement at anywhere near human speed.