← Back to context

Comment by qazxcvbnmlp

25 days ago

How do you choose which loss function over time to pursue?

Honestly, it's empirical. We started with what was easiest to measure: human correction rate. If I had to step in and fix something, that's a clear signal the agent took a bad path. Iterations and reverts turned out to be noisier -- sometimes high iteration count means the task was genuinely hard, not that the agent made a mistake. So we downweighted those. The meta-answer is: pick the metric that most directly captures "I wish the agent hadn't done that." For us that's human intervention. For a team with better test coverage, it might be test failures after commit. For infra work, maybe rollback frequency. There's no universal loss function — it depends on where your pain actually is. We just made it explicit and started logging it. The logging alone forced clarity.