← Back to context

Comment by mi_lk

6 hours ago

Cursor: Find me another benchmark where Composer 2.5 is a top 10 frontier coding model

(I work at Cursor) We score well on Terminal-Bench and SWE-bench Multilingual. DeepSWE, not so great yet, as it's more for very long-horizon tasks. We're planning to include more public benchmarks in our next model release.