Comment by AussieWog93

1 day ago

Apparently inference itself is profitable, at least according to an interview I watched with Dario. They even cover the cost of training itself, if you look at it on a model-by-model basis.

The cash burn comes from models ballooning in size - they spend (as an example, not actual numbers) 100M on training + inference for the lifetime of Sonnet 3.5, make 200M from subscriptions/api keys while it's SOTA, but then have to somehow come up with 1B to train Opus 4.0.

To run some other back of the envelope calcs: GLM 4.7 Air (previous "good" local LLM) can generate ~70 tok/s on a Mac Mini. This equates to 2,200 million tokens per year.

Openrouter charge $0.40 per million tokens, so theoretically if you were using that Mac mini at 100% utilisation you'd be generating $880 per annum "worth" of API usage.

Assuming a power draw of something 50W, you're only looking at 440kWh per annum. At 20c per kWh that's $90 on power, plus $499 to get the hardware itself. Depreciate that $499 hardware cost over 3 years and you're looking at ~$260 to generate ~$880 in inference income.