Comment by SparkyMcUnicorn
5 months ago
It looks like Sonnet 3.7 (extended thinking) would be a better architect than R1.
I'll be trying out Sonnet 3.7 extended thinking + Sonnet 3.5 or Flash 2.0, which I assume would be at the top of the leaderboard.
given 3.5 and 3.7 cost the same, it doesn't make sense to use 3.5 here.
I'd like to see that benchmark, but R1 + 3.7 should be cheaper than 3.7T + 3.7
The reason 3.5 (as the editor) makes more sense to me is the edit format success rate (99.6% vs 3.7's 93.3%).
Flash 2.0 got 100% on the edit format, and it's extremely cheap, so I'm pretty curious how that would score.