Comment by CamperBob2

4 hours ago

27B dense is not a screamer, even on an RTX 6000, but it will run at full precision with (more than) enough room for context at the model's own capacity. You can expect about 30 tokens/second after prompt processing. Quants will likely run similarly well on the 16/24/32 GB consumer GPUs.

The 3.5 27B model was a strong and capable reasoner, so I have high hopes for this one. Thanks to the team at Qwen for keeping competition in this space alive.