← Back to context

Comment by simonw

18 hours ago

Plenty of companies have revealed exactly how much energy and CO2 they have used training a model. Just off the top of my head, I've seen those numbers are available for Meta's Llama models, Microsoft's Phi series and DeepSeek's models - including their impressive DeepSeek v3 which trained for less than $6m in cost - a huge reduction compared to other similar models, and a useful illustration of how much more effect this stuff can get on the training side of things.