Comment by mpyne

5 days ago

They're responding to the people doing things like buying the most expensive Mac they can find specifically to do local inference for their AI agents.

Some do it to have control over their ability to use AI. Some do it because they think it will be cheaper to not have to pay a SaaS to generate tokens for them.

But for those interested in the latter case, it seems like it's not actually cheaper after all, at least at current prices. But then I don't expect prices to drastically jump because of how much competition there is in model development.

It's worth paying a premium for the privacy (assuming that llama.cpp and ollama aren't sending my sessions back to the cloud regardless...), and for the concerns about not getting a surprise bill.

  • > not getting a surprise bill.

    Correct me if I'm wrong, but I believe this is a feature that only Google has figured out how to implement. All of the other pay-as-you-go token services have a cap you can set, some by monthly spending, some with API key resolution, others by how much you put into the account. I use many, and if configured with auto-purchase disabled, it's not possible to have a "surprise" bill (except for Google!)

You also have control over your costs. It is reasonable to assume that tokens will cost significantly more in the near to medium future as the market consolidates and subsidies decline.

  • Google, Microsoft, Meta, Anthropic, OpenAI, Oracle and others are going to be looking to recoup all the money that they’ve spent to date. Why would the price go down in the future?

    • > Why would the price go down in the future?

      Because price is driven mainly by competition, not by a desire to recoup prior spending.

      Investors aren't doing things out of the sheer goodness of their hearts, so if they could just bump the price up they'd have already forced it up. The very existence of workable local models puts a cap on how high the price can realistically go, but the high level of competition still extant makes the price floor ever closer to the actual cost to generate tokens.

    • The AI numbers are huge, but I remember similar arguments about residential high-speed internet. According to Gemini, the "price for internet" is down 12% in real terms (ugh, capitalism!), while speeds are staggeringly faster.

      The providers have spent a fortune on wireless, pulled a lot of fiber/cable, and it's cheaper than it was when it started.