← Back to context

Comment by codemog

2 months ago

Interesting. I see papers where researchers will finetune models in the 7 to 12b range and even beat or be competitive with frontier models. I wish I knew how this was possible, or had more intuition on such things. If anyone has paper recommendations, I’d appreciate it.

3 comments

codemog

Reply

stavros 2 months ago

They're using a revolutionary new method called "training on the test set".

drob518 2 months ago
So, curve fitting the training data? So, we should expect out of sample accuracy to be crap?
- stavros 2 months ago
  
  Yeah, that's usually what tends to happen with those tiny models that are amazing in benchmarks.