Comment by guerrilla

20 days ago

> I’ll often kick off a process at the end of my day, or over lunch. I don’t need it to run immediately. I’d be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.

If it's not time sensitive, why not just run it at on CPU/RAM rather than GPU.

12 comments

guerrilla

weird-eye-issue 20 days ago

Yeah just run a LLM with over 100 billion parameters on a CPU.

kristjansson 20 days ago
200 GB is an unfathomable amount of main memory for a CPU
(with apologies for snark,) give gpt-oss-120b a try. It’s not fast at all, but it can generate on CPU.
- awestroke 19 days ago
  
  But it's incredibly incapable compared to SOTA models. OP wants high quality output but doesn't need it fast. Your suggestion would mean slow AND low quality output.
  
  1 reply →

bethekidyouwant 20 days ago

Run what exactly?

all2 20 days ago
I'm assuming GP means 'run inference locally on GPU or RAM'. You can run really big LLMs on local infra, they just do a fraction of a token per second, so it might take all night to get a paragraph or two of text. Mix in things like thinking and tool calls, and it will take a long, long time to get anything useful out of it.
- hxtk 19 days ago
  
  I’ve been experimenting with this today. I still don’t think AI is a very good use of my programming time… but it’s a pretty good use of my non-programming time.
  I ran OpenCode with some 30B local models today and it got some useful stuff done while I was doing my budget, folding laundry, etc.
  It’s less likely to “one shot” apples to apples compared to the big cloud models; Gemini 3 Pro can one shot reasonably complex coding problems through the chat interface. But through the agent interface where it can run tests, linters, etc. it does a pretty good job for the size of task I find reasonable to outsource to AI.
  This is with a high end but not specifically AI-focused desktop that I mostly built with VMs, code compilation tasks, and gaming in mind some three years ago.
- guerrilla 20 days ago
  
  Yes, this is what I meant. People are running huge models at home now, I assumed people could do it on premises or in a data center if you're a business, presumably faster... but yeah it definitely depends on what time scales we're talking.
  
  2 replies →

gruez 20 days ago

Does that even work out to be cheaper, once you factor in how much extra power you'd need?

HumanOstrich 19 days ago

How much extra power do you think you would need to run an LLM on a CPU (that will fit in RAM and be useful still)? I have a beefy CPU and if I ran it 24/7 for a month it would only cost about $30 in electricity.