Comment by leerob
3 hours ago
(I work at Cursor) We score well on Terminal-Bench and SWE-bench Multilingual. DeepSWE, not so great yet, as it's more for very long-horizon tasks. We're planning to include more public benchmarks in our next model release.
No comments yet
Contribute on Hacker News ↗