Comment by tsimionescu

1 year ago

I think the size of the model is only one part of it. They're still training these 7bn parameter models on the whole data set, and just crunching through that takes enormous compute, that people just didn't have at the current price points until now.

I should also mention that the idea itself of using GPUs for compute and then specifically for AI training was an innovation. And the idea that simply scaling up was going to be worth the investment is another major innovation. It's not just the existence of the compute power, it's the application to NN training tasks that got us here.

Here[0] is an older OpenAI post about this very topic. They estimate that between 2012 and 2018, the compute power used for training the SotA models at those times increased roughly 300,000 times, doubling every ~3.5 months.

[0] https://openai.com/index/ai-and-compute/