Comment by bob1029
1 hour ago
> And yes, if you want the absolute best, Opus 4.8 exists. It also costs more per 20 minutes of heavy use than I paid for this entire GPU and adapter setup combined. But the gap is shockingly small.
I don't think this is a fair characterization of the situation. I use frontier models via API pre-paid tokens every single day, and I can barely rack up $100 per month. The fact that we figured out how to burn double this in 20 minutes is impressive, but I don't think it reflects the reality that many are experiencing right now. There are some exceptionally gluttonous approaches to harnessing LLMs that I think are serving as convenient straw men in these discussions.
Paying for the API will almost always be more economical than self-hosting equivalent infrastructure. I am not against self-hosting, but the article suggests a primarily economic motivation for this effort. If you are consuming fewer than 10^9 tokens per month, I really don't think it's worth your time to try and compete with the hyperscalars. Most of the money is to be found in the integration of this technology with existing businesses.
Claude is something like $35 per million tokens. If I was using API pricing I could trivially spend $100 in a single hour long coding session, with /fast turned on in about 10 minutes. Not sure how you guys are using it.
coding is the easy part of using claude
Opus is normally $5 per mtok, no idea why anyone would use /fast if they were at all concerned about price. ($5 is still pricy though tbh)
I use hosted providers myself, but I can churn through $100 worth of tokens in half a day even with cheap models like Deepseek easily. If someone's use is as light as yours, then sure - grab a subscription and you'll save far more. For higher use it will come down to how cheap your electricity is whether it is worth offloading at least some of it (for me it's not, FWIW)