← Back to context

Comment by teaearlgraycold

7 hours ago

It will be very interesting to see what will happen when LLMs start charging users for their true cost. With many people priced out how would they cope?

May happen, but I suspect not in the way implied by that question.

Hardware is still improving, though not as fast as it used to; it's very plausible that even the current largest open weights models will run on affordable PCs and laptops in 5 years, and high-end smartphones in 7.

I don't know how big the SOTA close-weights models are, that may come later.

But: to the extent that a model that runs on your phone can do your job, your employer will ask "why are we paying you so much?" and now you can't afford the phone.

Even if the SOTA is always running ahead of local models, Claude Code could cost 1500 times as much and still have the average American business asking "So why did we hire a junior? You say the juniors learn when we train them, I don't care, let some other company do that and we only hire mid-tier and up now."

(Threshold is less than 1500 elsewhere, I just happened to have recently seen the average US pay for junior-grade software developers, $85k*, which is 350x cheaper, and my own observation that they're not only junior quality but also much faster to output than a junior).

* but also note while looking for a citation the search results made claims varying from $55k to $97.7k

They would fall behind in the world just like people from developing and poor countries do today.

  • Very few people fall behind at the moment due to lack of access to information. People in poor countries largely have access to the internet now. It doesn’t magically make people educated and economically prosperous.

    • You are arguing the converse. Access to information doesn't make people educated, but lack of access definitely puts people at a big advantage. Chatbots are not just information, they are tools and using it needs training because they hallucinate.

It's not that expensive unless you run millions of tokens through an agent. For use cases where you actually read all the input and output by yourself (i.e. an actual conversation), it is insanely cheap.

  • Yeah in my last job, unsupervised dataset-scale transformations amounted to 97% of all spending. We were using gemini 2.5 flash in batch/prefill-caching mode in Vertex, and always the latest/brightest for ChatGPT-like conversations.