Comment by rainsford
5 hours ago
We moved from the mainframe era to desktops and smaller servers because computers got fast enough to do what we needed them to do locally. Centralized computing resources are still vastly more powerful than what's under your desk or in a laptop, but it doesn't matter because people generally don't need that much power for their daily tasks.
The problem with AI is that it's not obvious what the upper limit of capability demand might be. And until or if we get there, there will always be demand for the more capable models that run on centralized computing resources. Even if at some point I'm able to run a model on my local desktop that's equivalent to current Claude Opus, if what Anthropic is offering as a service is significantly better in a way that matters to my use case, I will still want to use the SaaS one.
> Even if at some point I'm able to run a model on my local desktop that's equivalent to current Claude Opus, if what Anthropic is offering as a service is significantly better in a way that matters to my use case, I will still want to use the SaaS one.
Only if it's competitively priced. You wouldn't want to use the SaaS if the breakeven in investment on local instances is a matter of months.
Right now people are shelling out for Claude Code and similar because for $200/m they can consume $10k/m of tokens. If you were actually paying $10k/m, than it makes sense to splurge $20k-$30k for a local instance.
The underlying advantage of local inference is that you're repurposing your existing hardware for free. You don't need your token spend to pay a share of the capex cost for datacenters that are large enough to draw gigawatts in power, you can just pay for your own energy use. Even though the raw energy cost per operation will probably be higher for local inference, the overall savings in hardware costs can still be quite real.