Comment by brulard

1 month ago

Did I? Not only are you comparing apples to oranges, you even provide misleading numbers.

3090 gets 20-30 tokens a second for dense ~30B models (QwQ 32B, Gemma 3 27B Q4), similar to M3 ultra. If you are talking about Qwen3-Coder 30B (MoE), then both 3090 and M3 Ultra are around ~70 tok/s.

But even if you were right about the speed - which you are not - speed is pointless if you need large model that wouldn't fit into your VRAM.