Comment by simonw
4 hours ago
The DeepSeek v3 paper claims to have trained from scratch for ~$5.5m: https://arxiv.org/pdf/2412.19437
Kimi K2 Thinking was reportedly trained for $4.6m: https://www.cnbc.com/2025/11/06/alibaba-backed-moonshot-rele...
Both of those were frontier models at the time of their release.
Another interesting number here is Claude 3.7 Sonnet, which may people (myself included) considered the best model for several months after its release and was apparently trained for "a few tens of millions of dollars": https://www.oneusefulthing.org/p/a-new-generation-of-ais-cla...
No comments yet
Contribute on Hacker News ↗