Comment by nothinkjustai
10 hours ago
> It's widely understood that the big players are making profit on inference.
Are they? Or are they just saying that to make their offerings more attractive to investors?
Plus I think most people using agents for coding are using subscriptions which they are definitely not profitable in.
Locally running models that are snappy and mostly as capable as current sota models would be a dream. No internet connection required, no payment plans or relying on a third party provider to do your job. No privacy concerns. Etc etc.
> Plus I think most people using agents for coding are using subscriptions which they are definitely not profitable in.
Where on earth do people get this idea? Subscriptions that are based around obscure, vendor defined "credits" are the perfect business model for vendors. They can change the amount you can use whenever they want.
It's likely they occasionally make a loss on some users but in general they are highly profitable for AI companies:
> Anthropic last month projected it would generate a 40% gross profit margin from selling AI to businesses and application developers in 2025
and
> OpenAI projected a gross margin of around 46% in 2025, including inference costs of both paying and nonpaying ChatGPT users.
https://archive.is/aKFYZ#selection-1075.0-1083.119
Both of those companies are losing hella money, dude just cuz they say they “expect” to be profitable doesn’t mean they are.
You can pick models that are snappy, or models that are as capable as SOTA. You don't really get both unless you spend extremely unreasonable amounts of money on what is essentially a datacenter-scale inference platform of your own, meant to service hundreds of users at once. (I don't care how many agent harnesses you spin up at once, you aren't going to get the same utilization as hundreds of concurrent users.)
This assessment might change if local AI frameworks start working seriously on support for tensor-parallel distributed inference, then you might get away with cheaper homelab-class hardware and only mildly unreasonable amounts of money.