Comment by StrauXX

20 hours ago

Self hosting at a reasonable scale is much cheaper than people think. I am running clusters of DGX Spark machines with BiFrost load balancers in our company and for client projects. They work flawlessly!

128 GB unified memory, Nvidia chip and ARM CPU for just around 3k€ net. They easily push ~400 input and ~100 output tokens per second per device on say gpt-oss-120b. With two devices in a cluster, thats enough performance for >20 concurrent RAG users or >3 "AI augmented" developers.

And they don't even pull that much power.

2 comments

StrauXX

byzantinegene 8 hours ago

factor in depreciation and energy costs, and a subscription might just be cheaper.

StrauXX 4 hours ago

It is definetly cheaper now. What I want to say with this, is that token costs rising so dramatically that AI usage becomes uneconomical is not a high probability future. Even if AI subscriptions were sold heavily below cost (which is also unlikely, after R&D).