← Back to context

Comment by leerob

4 hours ago

(I work at Cursor) When Composer 2.5 launched, we initially scored very competitively on AA's composite benchmark. I believe 3rd place overall. They have recently updated to use DeepSWE, which has more of a focus on very long-horizon tasks, and Composer isn't as good at those yet. We're aware and working on this for our next model.

Overall, some benchmarks show Composer doing well, others not so much. We think the model is very capable at the given price point. There's lots to improve! If you see any specific behaviors or places the model isn't very good, lmk here or can email me lrobinson at cursor.com.

> We think the model is very capable at the given price point.

The "price point" comparison is a lie though because Composer is only available with a monthly Cursor subscription, and Cursor's external-model-per-token charges for other models are not representative of what other models' monthly subscribers get. An OpenAI $200 subscription gets you at least as much GPT 5.5 as a $200 Cursor subscription gets you Composer 2.5.

How does it compare to a $100 Claude subscription at $60? Especially in terms of how much of it I can use, because I havent found anything that is in the US that can get me similar usage as Claude at $100 per month or less, really open to alternatives.

Grok build only gave me roughly 10 hours of use for $40 for the entire month...

I don't even care about long horizon, can I use it a reasonable amount of time through the month? I use AI for hobby projects, Claude gets me quite far, but I tire of dropping $100 every month. I'm not sending my money to some Chinese firm that now has access to my computer.

Even with the new benchmark, Composer 2.5 seems to be just a bit worse than Opus 4.7. So I assume it's going to be about similar with Sonnet 5.0 at 1/6 of the cost.