← Back to context

Comment by pclmulqdq

1 year ago

Also for $36.83 compared to o1's $186.50

7 comments

pclmulqdq

Reply

But also for $36.83 compared to DeepSeek R1 + claude-3-5 it's $13.29 and for latter "Percent using correct edit format" is 100% vs 97.8% for 3.7.

edit: would be interesting to see how combo DeepSeek R1 + claude-3-7 performs.

tw1984 1 year ago
is there any public info on why such DeepSeek R1 + claude-3-5 combo worked better than using a single model?
- alienthrowaway 1 year ago
  
  Sonnet 3.5 is the best non-Chain-of-Thought code-authoring model. When paired with R1's CoT output, Sonnet 3.5 performs even better - outperforming vanilla R1 (and eveything else), which suggests Sonnet is better than R1 at utilizing R1's CoT.
  It's scenario where the result is greater than the sum of it's parts
- Ballas 1 year ago
  
  From my experiments with the Deepseek Qwen-32b distill model, the Deepseek model did not follow the edit instructions - the format was wrong. I know the distill models are not at all the same as the full model, but that could provide a clue. Combine that information with the scores, then you have a reasonable hypothesis.
  
  1 reply →
- WiSaGaN 1 year ago
  
  My personal experience is that R1 is smarter than 3.5 sonnet, but 3.5 sonnet is a better coder. Thus it may be better to let R1 to tackle the problem, but let 3.5 sonnet to implement the solution.
  
  1 reply →