Comment by DaiPlusPlus
19 hours ago
> I wonder how long it'll be before all AI costs are flat unlimited monthly fees or even free across the board, without compromise.
That's already the case if you can self-host an LLM; you don't even need a mythical H200: gamer-grade GeForce cards can get you a long way there (if this page is to be believed: https://www.runpod.io/gpu-compare/rtx-5090-vs-h200 )
...after RAM prices return to normalcy, of course - and then wait another 2 or 3 generations of GPU development for a 96GB HBM card to hit the streets - and also assuming SotA or cloud-only LLMs don't experience lifestyle-inflation, but I assume they must, because OpenAI/Anthropic/Etc's business-model depends on people paying them to access them, so it's in their interests to make it as difficult as possible to run them locally.
Give it 5 years from now and reassess.
That page compares models that easily fit inside the ram on either GPU. The biggest difference comes when one card can fit a model and the other cannot.