← Back to context

Comment by coder543

1 day ago

> The systems but old but I’m seeing 11tks 27b, 15tks 35b MoE

If that's accurate, then you must be doing something wrong/weird. On a single RTX 3090, I'm seeing substantially higher performance. Dual GPU won't necessarily give a ton of performance improvement, but it shouldn't hurt performance.

With llama-bench, I just measured Qwen3.6-27B at 41 tok/s and Qwen3.6-35B-A3B at 153 tok/s on one RTX 3090. (Those results are without MTP. With MTP, I'm seeing about 65 to 70 tok/s for Qwen3.7-27B.)

I'm using the unsloth UD-Q4_K_XL quant. If you're using bf16 for some reason, that could explain the low performance and inability to have enough context despite having 48GB of VRAM, I guess, but... don't do that.