Comment by andy99
16 hours ago
Gemini Flash light is $.1/Million input tokens, Claude Haiku is $1/Million. Obviously input dominates here if it’s just a classifier. Training data easily can top 10 Trillion tokens - An earlier Kimi K2 was trained on 15T and even HF SmolLM 3B was trained on 11T.
So if I calculate right, it’s $100k-$1M per trillion tokens or $1-10M for a full dataset.
That’s way more than I expected, there is probably also some discount at that volume :)
No comments yet
Contribute on Hacker News ↗