Comment by heliumtera
9 hours ago
You need 600gb of VRAM + MEMORY (+ DISK) to fit the model (full) or 240 for the 1b quantized model. Of course this will be slow.
Through moonshot api it is pretty fast (much much much faster than Gemini 3 pro and Claude sonnet, probably faster than Gemini flash), though. To get similar experience they say at least 4xH200.
If you don't mind running it super slow, you still need around 600gb of VRAM + fast RAM.
It's already possible to run 4xH200 in a domestic environment (it would be instantaneous for most tasks, unbelievable speed). It's just very very expensive and probably challenging for most users, manageable/easy for the average hacker news crowd.
Expensive AND hard to source high end GPUs, if you manage to source for the old prices around 200 thousand dollars to get maximum speed I guess, you could probably run decently on a bunch of high end machines, for let's say, 40k (slow).
No comments yet
Contribute on Hacker News ↗