Comment by tsimionescu

1 year ago

I think by far the biggest advances are related to compute power. The amount of processing needed to run training algorithms on the amounts of data needed for the latest models was just not possible even five years ago, and definitely not ten years ago.

I'm sure there are optimizations from the model shape as well, but I don't think that running the best algorithms we have today with hardware from five-ten years ago would have worked in any reasonable amount of time/money.

A 30bn param model, hell even a 7bn param model, is still incredibly useful and I feel like that could have been doable a decade ago!

We have GPT-4 (or at least 3.5) tier performance in these much smaller models now. If we teleported back in time it may have been possible to build

  • I think the size of the model is only one part of it. They're still training these 7bn parameter models on the whole data set, and just crunching through that takes enormous compute, that people just didn't have at the current price points until now.

    I should also mention that the idea itself of using GPUs for compute and then specifically for AI training was an innovation. And the idea that simply scaling up was going to be worth the investment is another major innovation. It's not just the existence of the compute power, it's the application to NN training tasks that got us here.

    Here[0] is an older OpenAI post about this very topic. They estimate that between 2012 and 2018, the compute power used for training the SotA models at those times increased roughly 300,000 times, doubling every ~3.5 months.

    [0] https://openai.com/index/ai-and-compute/