Comment by SwellJoe

3 hours ago

The current dense models from Gemma 4 or Qwen 3.6 families will run well on a consumer GPU with 32GB in a 4-bit quantization (which is a little lossy for Qwen 3.6, not so much for Gemma 4, as it has a QAT 4-bit version). Even an Intel ARC B70 will work, though it's worth spending a little more for a the AMD Radeon AI Pro 9700, as it'll be like 40% faster, I think. A dedicated GPU will be faster and cheaper than a Mac Mini. But, nothing is a good deal right now, everything is overpriced (except DeepSeek tokens, which cost pennies to run a model that's better than anything you could self-host...DeepSeek V4 Flash, and even Pro, are absurdly cheap, made even cheaper by their bonkers cheap cached token pricing and uniquely effective caching).

0 comments

SwellJoe

No comments yet

Contribute on Hacker News ↗