Comment by weird-eye-issue

18 days ago

Yeah just run a LLM with over 100 billion parameters on a CPU.

200 GB is an unfathomable amount of main memory for a CPU

(with apologies for snark,) give gpt-oss-120b a try. It’s not fast at all, but it can generate on CPU.

  • But it's incredibly incapable compared to SOTA models. OP wants high quality output but doesn't need it fast. Your suggestion would mean slow AND low quality output.