Comment by rhdunn

1 day ago

Yeah. I run LLM models locally and for me 22B-32B is the largest I'm willing to invest in trying out.

Even though Mistral 4 has 6B active parameters per token (allowing 3-3.5 per token parameters to be loaded on a 4090), the ~240GB download + storage is pushing the limits of being able to try this out locally, especially if you are downloading and evaluating multiple models.

It also makes it harder for other people to make downstream finetunes like with what happened with the older Mistral/Magistral models.

I think machines like the DGX Spark are about to become a lot more common/popular. It’s big enough to run sparse 150-250B MoEs with enough throughout for a single user. Deepseek v4 Flash is #1 (in terms of usage) on OpenRouter because it’s good enough to be useful. You can run it on a Spark (though it runs better across 2, which is getting up there in cost)