Comment by loveparade
5 hours ago
> If you used the same architecture as GPT2 today you're in for a bad time training a new frontier model. It's only because we have dozens of breakthroughs
What exactly are these dozens of breakthroughs? Most frontier models architectures today still look very much like GPT2 at their core. There were various of improvements like instructgpt, finetuning techniques, efficiency improvements with kv caches, faster attention, lora, better tokenizers, etc. Most of these are for making things run faster. The biggest differentiator has probably been data curation and post-training data and the ability to fit more into the model. But I think we had few breakthroughs that would fall into the category of different technologies.
No comments yet
Contribute on Hacker News ↗