Comment by zackangelo
1 day ago
Yes, this is true. A lot of times labs will hold back necessary infrastructure pieces that allow them to train huge models reliably and on a practical time scale. For example, many have custom alternatives to Nvidia’s NCCL library to do fast distributed matrix math.
Deepseek published a lot of their work in this area earlier this year and as a result the barrier isn’t as high as it used to be.
No comments yet
Contribute on Hacker News ↗