Comment by veselin

1 day ago

I am taking for SWE bench style problems where Todo doesn't help, except for more parallelism.

Was guessing that, coding tasks are a valuable but myopic lense :)

I'm guessing a self-updating plan there is sufficient. I'm not actually convinced today's current plan <> todolist flow makes sense - in the linked PLAN.md, it gets unified, and that's how we do ai coding. I don't have evals on this, but from a year of vibes coding/engineering, that's what we experientially reached across frontier coding models & tools. Nowadays we're mixing in evals too, but that's a more complicated story.