Comment by xianshou

1 year ago

Crazy how simple the technique is if this holds up. Just <think> and <reflection> plus synthetic data, used to finetune Llama 3.1 70B.

Note that there's a threshold for how smart the model has to be to take advantage of this flow (https://x.com/mattshumer_/status/1831775436420083753) - 8B is too dumb.

In which case, what happens if you apply this to a GPT-4o finetune, or to Claude 3.5 Sonnet?

What happens if you combine it with variants of tree-based reasoning? With AlphaProof (https://www.nature.com/articles/s41586-023-06747-5#Sec3)? With MCTSr (https://arxiv.org/abs/2406.07394)?

4 comments

xianshou

jug 1 year ago

I was just thinking - since GPT-4o and Sonnet are closed models, do we know that this method was not already used to train them? And that Reflection is simply finding a path for greater improvements than they did. Llama 3.1 apparently didn't improve as much. It's just a thought though.

hdeezy 1 year ago
If they had, this thing wouldn't be trading punches with them at its size
- segmondy 1 year ago
  
  Sonnet does something like this. See - https://tyingshoelaces.com/blog/forensic-analysis-sonnet-pro...
- itorcs 1 year ago
  
  What parameter size are 4o and sonnet?