Comment by booty

8 hours ago

Prevailing wisdom is that serving LLMs at a profit is achievable... it's when you factor in the cost of training them that prices get astronomical real fast.

Open-source model inference providers (who do not have to bear the cost of training) seem able to do it at much lower prices.

https://www.together.ai/pricing

https://fireworks.ai/pricing#serverless-pricing (scroll down to headline models)

Of course, it's possible that they are burning through investor cash as well, and apples-to-apples comparisons are not possible because AFAIK Google does not mention the size/paramcount for 3.5 Flash.

But if the prevailing wisdom is true, I think it's actually encouraging. It suggests that OpenAI and Anthropic could perhaps, if they need to, achieve profitability if they slow down model development and focus on tooling etc. instead. If true that's probably good news for everybody w.r.t. preventing a bursting of this economic bubble.

...my opinions here are of course, conjecture built on top of conjecture....

Most of the training cost is not in the final training run, it's in all of the R&D (including salaries, equity, etc.) that it takes to get to the final training run. The actual cost of all of the TPUs (or GPUs), power, networking, storage, etc. for the final training run is significant, but it's even more expensive to have this huge R&D team doing frontier model development and using a lot of those same resources during development.

I think you're right that releasing models at a slower cadence would bring down costs to some degree, but it's not clear how much. All of these companies could significantly reduce their opex but at the risk of falling behind in terms of being at the frontier.

Not to discredit you, because you are 100% correct but tangential note about together.ai, they seem fairly unreliable with constant outages or higher than normal latency.