Comment by aspenmartin
2 days ago
It's more than the raw hardware, it's the interconnect and communication between the hardware at scale. These models are trained on hundreds of thousands of GPUs today. You _will_ start to see cross-datacenter training runs but this needs to efficiently decide when and how to communicate across datacenter, which bears a very high cost compared to intra-datacenter communication.
No comments yet
Contribute on Hacker News ↗