Comment by supermatt
4 days ago
Jeff, This is the second time you have been given a prosumer level cluster pretty much built for local LLM inference and on both occasions you have performed benchmarks without batching.
If you still have the hardware (this and the Mac cluster) can you PLEASE get some advice and run some actually useful benchmarks?
Batching on a single consumer GPU often results in 3-4x the throughput. We have literally no idea what that batching looks like on a $10k+ cluster without otherwise dropping the cash to find out.
No comments yet
Contribute on Hacker News ↗