Comment by miki123211

21 hours ago

This is, sadly, obvious and inevitable in retrospect.

The two major drivers of inference costs are GPUs and electricity. You can't get cheaper GPUs, but you can make existing GPUs not sit idle, and you do that by utilizing them 24/7, processing user B's request when user A is thinking, and handling many requests in parallel, neither of which you can do as an individual. You can get cheaper electricity... by moving, and it's much easier to move your AI workload than to move yourself.

This is a completely different dynamic than renting houses or apartments, as you can't really rent out the same house to different people at different times of day.

32 comments

miki123211

cootsnuck 20 hours ago

Yea. LLM inference requires batch processing to have a shred of hope at being cost efficient. Batch processing requires a not so insignificant amount of scale (but probably not as much as people think).

I'm very pro local models, but not to have parity with SoTA frontier models. Just contextually trained small models doing smaller specific tasks.

Trying to run bigger LLMs for an individual user to do big tasks is not going to be a good time.

MichaelZuo 4 hours ago

Wasnt this pretty evident to pretty much anyone who knew even a bit about inferencing?
Idk what people were thinking. I’ve never seen anyone offer a plausible way to sidestep batch processing for example.

zozbot234 17 hours ago

You can definitely run many requests in parallel as a single user, you just have to be OK with a significant slowdown for any single request. Cloud inference can't reach that ratio of total throughput per hardware cost since they are heavily incented to get the most expensive hardware available and to then minimize latency (and RAM occupation over time) even at the cost of throughput. Running slower inference with cheaper hardware is just not workable in a cloud setting.

PowerElectronix 16 hours ago

On top of that, AI providers are also eating a big loss on the service.

tempay 16 hours ago
Are they? I only ever see unsubstantiated claims for this whereas I see many justifications that interference is comfortably profitable in isolation.
- tomelders 11 hours ago
  
  SpaceX's has disclosed that they're loosing $2Bln a quarter on A.I - and rising - in their IPO documents.
  Anthropic told the Department of War-nee-Defence that they'd made $5bln total, which is a lot LOT less than what they're spending.
  We'll see what's in OpenAi's IPO later this year I guess. I'll be very surprised if they're losing less that $100bln a year.
  
  1 reply →
- ai_fry_ur_brain 16 hours ago
  
  Its basic math, go calculate max sessions for a certain tps on any hardware. Session# * tps * 86400 (secs in a day) * 30 days.
  You'll realize real quick its not profitible. You cant just say things you don't like to hear are unsubstantiated without verifying.
  Not to mention, subscriptions.. $2mm in GPUs being given out for 5 hrs a day at a cost of $200 a month.
  I could easily say that everyone who says its profitible is msking unsubstantiated claims lol.
  
  12 replies →
- exploderate 8 hours ago
  
  Especially since their costs might be multi-year investments. It's too early to judge the quality of those investments.
solumunus 16 hours ago
Supposedly Anthropic just reported that they’re operationally profitable. So maybe not?
- akho 15 hours ago
  
  "operationally" implies that capex (which I would assume includes datacenters, gpus, and r&d) is not in. So the big news is that they can now pay for electricity and sysadmin.
  
  1 reply →

adrianN 19 hours ago

Historically it was not uncommon for beds to be rented out to multiple people.

bredren 15 hours ago

The word for this type of boarding is “flophouse.”
This is the type of place one might be “waiting for the other shoe to drop.” Which carries a variety of potential meanings in this moment of AI.
Tangentially related: Mack and the boys lived in the “Palace Flophouse and Grill” in Cannery Row.
I suppose I must have looked up flophouse when reading all the Steinbeck I could get my hands on and it’s stuck w me.
eecc 16 hours ago

It is unfortunately still common practice among irregular agricultural workers in many parts of the world (I’m Italian so I definitely remember news about busts in southern Italy)
consp 16 hours ago

See military submarines, for a modern version.
AdamN 12 hours ago

Yeah there are good accounts of this in Down and Out in Paris and London and also one of Hemingway's books - forgot which one.

Unit327 17 hours ago

It also doesn't help that they probably sell tokens below cost.

graemep 14 hours ago

High usage seems to change the economics. The author of the article had a payback period of about 14 months which is excellent by any standards and an order of magnitude better than rent vs buy for a house in most places.

dpark 7 hours ago

> You can't get cheaper GPUs

You absolutely can. OpenAI et al are paying a fortune for GPUs but they are not paying retail prices.

The entire business model of retail is to sell above cost.