Comment by sgc
20 hours ago
As far as I can tell this type of model requires 640GB+ of memory using FP8. So likely can be run using 320GB+ memory if using FP4 or similar. So that would be 3 Nvidia DGX Sparks, or 12k of hardware. Is that correct? If so, it could make perfect sense for a small business.
The performance would be abysmal spread across four Sparks, I'd think, though I guess MoE mitigates that somewhat. Still better to just pay for it in the cloud. (Though I've spent about $4k on local compute for AI experimentation, I don't think it pays for itself, I just like tinkering.)
You probably need four of them in practice.
[dead]