Comment by deaux
19 hours ago
1. Good point, didn't know about the timeouts, that's rough for the benchmarks. Though they IMO don't necessarily be "SWE-official" to have value, if the only difference is disabling those.
3. Maybe you could instead provide a measure of output tokens used (including thinking), as that's a reasonable measure for speed. I guess input tokens would be similar unless the AST usage and hashes etc increases them a lot? Seems unlikely.
No comments yet
Contribute on Hacker News ↗