← Back to context

Comment by guerrilla

20 days ago

> I’ll often kick off a process at the end of my day, or over lunch. I don’t need it to run immediately. I’d be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.

If it's not time sensitive, why not just run it at on CPU/RAM rather than GPU.

Yeah just run a LLM with over 100 billion parameters on a CPU.

  • 200 GB is an unfathomable amount of main memory for a CPU

    (with apologies for snark,) give gpt-oss-120b a try. It’s not fast at all, but it can generate on CPU.

    • But it's incredibly incapable compared to SOTA models. OP wants high quality output but doesn't need it fast. Your suggestion would mean slow AND low quality output.

      1 reply →

Run what exactly?

  • I'm assuming GP means 'run inference locally on GPU or RAM'. You can run really big LLMs on local infra, they just do a fraction of a token per second, so it might take all night to get a paragraph or two of text. Mix in things like thinking and tool calls, and it will take a long, long time to get anything useful out of it.

    • I’ve been experimenting with this today. I still don’t think AI is a very good use of my programming time… but it’s a pretty good use of my non-programming time.

      I ran OpenCode with some 30B local models today and it got some useful stuff done while I was doing my budget, folding laundry, etc.

      It’s less likely to “one shot” apples to apples compared to the big cloud models; Gemini 3 Pro can one shot reasonably complex coding problems through the chat interface. But through the agent interface where it can run tests, linters, etc. it does a pretty good job for the size of task I find reasonable to outsource to AI.

      This is with a high end but not specifically AI-focused desktop that I mostly built with VMs, code compilation tasks, and gaming in mind some three years ago.

    • Yes, this is what I meant. People are running huge models at home now, I assumed people could do it on premises or in a data center if you're a business, presumably faster... but yeah it definitely depends on what time scales we're talking.

      2 replies →

Does that even work out to be cheaper, once you factor in how much extra power you'd need?

  • How much extra power do you think you would need to run an LLM on a CPU (that will fit in RAM and be useful still)? I have a beefy CPU and if I ran it 24/7 for a month it would only cost about $30 in electricity.