← Back to context

Comment by alex43578

17 days ago

I disagree with the original position that "You could train a state of the art model on cluster of 12+ year old boxes". Regardless of the country's resources, the best training methods can't makeup for the vast difference in compute and scale. The best 100 or 70B models aren't close to GPT, Gemini, or Claude; and there's certainly no chance the best 100B models could've been trained with the compute reasonably available in a single source 10 years ago.