Comment by CamperBob2
21 hours ago
A pair of RTX6000 cards will give you a good performance boost due to tensor parallelism, though. I haven't tried the newest predictive quants but I see about 35 tps when running the 8-bit Qwen 3.6 27B model on one board and about 50 tps on two. Probably could come close to 100 tps on an optimized setup with the latest GGUFs.
Also, the 4-bit quants of MiniMax 2.7 will run at 100 tps or so with two cards, which is pretty decent. It doesn't go any faster at all with 4 GPUs from what I've seen, so if you don't actively need 384 GB of VRAM, 2x RTX6000 is a good place to be.
You can get 70-80 tps on qwen3.6-27b f16 with MTP on a single card