Comment by OtherShrezzing
11 days ago
A useful feature would be slow-mode which gets low cost compute on spot pricing.
I’ll often kick off a process at the end of my day, or over lunch. I don’t need it to run immediately. I’d be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.
https://platform.claude.com/docs/en/build-with-claude/batch-...
> The Batches API offers significant cost savings. All usage is charged at 50% of the standard API prices.
Can this work for Claude? I think it might be raw API only.
I'm not sure I understand the question? Are you perhaps asking if messages can be batched via Claude Code and/or the Claude web UI?
2 replies →
OpenAI offers that, or at least used to. You can batch all your inference and get much lower prices.
Still do. Great for workloads where it's okay to bundle a bunch of requests and wait some hours (up to 24h, usually done faster) for all of them to complete.
Yep same, I often think why this isn’t a thing yet. Running some tasks in the night at e.g. 50% of the costs - there’s the batch api but that is not integrated in e.g. claude code
The discount MAX plans are already on slow-mode.
> I’ll often kick off a process at the end of my day, or over lunch. I don’t need it to run immediately. I’d be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.
If it's not time sensitive, why not just run it at on CPU/RAM rather than GPU.
Yeah just run a LLM with over 100 billion parameters on a CPU.
200 GB is an unfathomable amount of main memory for a CPU
(with apologies for snark,) give gpt-oss-120b a try. It’s not fast at all, but it can generate on CPU.
2 replies →
Run what exactly?
I'm assuming GP means 'run inference locally on GPU or RAM'. You can run really big LLMs on local infra, they just do a fraction of a token per second, so it might take all night to get a paragraph or two of text. Mix in things like thinking and tool calls, and it will take a long, long time to get anything useful out of it.
4 replies →
Does that even work out to be cheaper, once you factor in how much extra power you'd need?
How much extra power do you think you would need to run an LLM on a CPU (that will fit in RAM and be useful still)? I have a beefy CPU and if I ran it 24/7 for a month it would only cost about $30 in electricity.