Comment by codemog
6 days ago
Interesting. I see papers where researchers will finetune models in the 7 to 12b range and even beat or be competitive with frontier models. I wish I knew how this was possible, or had more intuition on such things. If anyone has paper recommendations, I’d appreciate it.
They're using a revolutionary new method called "training on the test set".
So, curve fitting the training data? So, we should expect out of sample accuracy to be crap?
Yeah, that's usually what tends to happen with those tiny models that are amazing in benchmarks.