← Back to context

Comment by sothatsit

4 days ago

DeepSeek's contributions to training efficiency improvements were as, if not more, important than the models themselves. A lot of the worry people had about DeepSeek was related to people questioning the moat of the big AI players, since DeepSeek was able to train a competitive model with so much less compute.

Their innovations in training efficiency were almost guaranteed to have been heavily considered by the big AI labs. For example, Dario Amodei talks about the efficiency improvements being the real important contribution of DeepSeek V3 here: https://www.darioamodei.com/post/on-deepseek-and-export-cont...

> DeepSeek's team did this via some genuine and impressive innovations, mostly focused on engineering efficiency. There were particularly innovative improvements in the management of an aspect called the "Key-Value cache", and in enabling a method called "mixture of experts" to be pushed further than it had before.

Almost all of High Flyers achievements have more to do with scaling the process but when scaling is all you need, it’s darn effective