Comment by bjt12345
3 days ago
Here's my take on it though...
Just as we had the golden era of the internet in the late 90s, when the WWW was an eden of certificate-less homepages with spinning skulls on geocities without ad tracking, we are now in the golden era of agentic coding where massive companies make eye watering losses so we can use models without any concerns.
But this won't last and Local Llamas will become a compelling idea to use, particularly when there will be a big second hand market of GPUs from liquidated companies.
Yes. This heavily subsidized LLM inference usage will not last forever.
We have already seen cost cutting for some models. A model starts strong, but over time the parent company switches to heavily quantized versions to save on compute costs.
Companies are bleeding money, and eventually this will need to adjust, even for a behemoth like Google.
That is why running local models is important.
Unfortunately, GPUs die in datacenters very quickly, and GPU manufacturers don't care about hardware longevity.
Yep, when the tide goes away no company will be able to keep swimming naked offering stuff for free