Comment by alex43578
17 days ago
I disagree with the original position that "You could train a state of the art model on cluster of 12+ year old boxes". Regardless of the country's resources, the best training methods can't makeup for the vast difference in compute and scale. The best 100 or 70B models aren't close to GPT, Gemini, or Claude; and there's certainly no chance the best 100B models could've been trained with the compute reasonably available in a single source 10 years ago.
No comments yet
Contribute on Hacker News ↗