← Back to context

Comment by doctoboggan

5 months ago

Have you tried Claude 3.7 + Deepseek as the architect? Seeing as "DeepSeek R1 + claude-3-5-sonnet-20241022" is the second place option, "DeepSeek R1 + claude-3-7" would hopefully be the highest ranking choice so far?

It looks like Sonnet 3.7 (extended thinking) would be a better architect than R1.

I'll be trying out Sonnet 3.7 extended thinking + Sonnet 3.5 or Flash 2.0, which I assume would be at the top of the leaderboard.

  • given 3.5 and 3.7 cost the same, it doesn't make sense to use 3.5 here.

    I'd like to see that benchmark, but R1 + 3.7 should be cheaper than 3.7T + 3.7

    • The reason 3.5 (as the editor) makes more sense to me is the edit format success rate (99.6% vs 3.7's 93.3%).

      Flash 2.0 got 100% on the edit format, and it's extremely cheap, so I'm pretty curious how that would score.