Comment by doctoboggan
5 months ago
Have you tried Claude 3.7 + Deepseek as the architect? Seeing as "DeepSeek R1 + claude-3-5-sonnet-20241022" is the second place option, "DeepSeek R1 + claude-3-7" would hopefully be the highest ranking choice so far?
It looks like Sonnet 3.7 (extended thinking) would be a better architect than R1.
I'll be trying out Sonnet 3.7 extended thinking + Sonnet 3.5 or Flash 2.0, which I assume would be at the top of the leaderboard.
given 3.5 and 3.7 cost the same, it doesn't make sense to use 3.5 here.
I'd like to see that benchmark, but R1 + 3.7 should be cheaper than 3.7T + 3.7
The reason 3.5 (as the editor) makes more sense to me is the edit format success rate (99.6% vs 3.7's 93.3%).
Flash 2.0 got 100% on the edit format, and it's extremely cheap, so I'm pretty curious how that would score.