← Back to context

Comment by pclmulqdq

5 months ago

Also for $36.83 compared to o1's $186.50

But also for $36.83 compared to DeepSeek R1 + claude-3-5 it's $13.29 and for latter "Percent using correct edit format" is 100% vs 97.8% for 3.7.

edit: would be interesting to see how combo DeepSeek R1 + claude-3-7 performs.

  • is there any public info on why such DeepSeek R1 + claude-3-5 combo worked better than using a single model?

    • Sonnet 3.5 is the best non-Chain-of-Thought code-authoring model. When paired with R1's CoT output, Sonnet 3.5 performs even better - outperforming vanilla R1 (and eveything else), which suggests Sonnet is better than R1 at utilizing R1's CoT.

      It's scenario where the result is greater than the sum of it's parts

    • From my experiments with the Deepseek Qwen-32b distill model, the Deepseek model did not follow the edit instructions - the format was wrong. I know the distill models are not at all the same as the full model, but that could provide a clue. Combine that information with the scores, then you have a reasonable hypothesis.

      1 reply →

    • My personal experience is that R1 is smarter than 3.5 sonnet, but 3.5 sonnet is a better coder. Thus it may be better to let R1 to tackle the problem, but let 3.5 sonnet to implement the solution.

      1 reply →