Comment by andy99

3 months ago

Gemini Flash light is $.1/Million input tokens, Claude Haiku is $1/Million. Obviously input dominates here if it’s just a classifier. Training data easily can top 10 Trillion tokens - An earlier Kimi K2 was trained on 15T and even HF SmolLM 3B was trained on 11T.

So if I calculate right, it’s $100k-$1M per trillion tokens or $1-10M for a full dataset.

That’s way more than I expected, there is probably also some discount at that volume :)

0 comments

andy99

No comments yet

Contribute on Hacker News ↗