Comment by organsnyder

1 day ago

I run Qwen 3.6 on my Framework Desktop 128GB, and it's very performant. I know Framework has had to raise the price since I preordered mine, but they're still well under half the cost of that Macbook.

Can you please explain how you set it up? I run it on my 129G Strix Halo under Arch with Lemonade with OpenCode and it just sits there doing barely anything unless I leave it to run over night. Then it says it thought for 13.7 seconds but was really 15 minutes. Thanks! I am using the 27B dense MTP model quantized by UnSloth with the UD-Q8_K_L if memory serves.

I get ~55 Tok/s on my framework desktop with the 35B A3B q8 model, and so far am also very happy with the coding performance.

  • did you upgrade to MTP?

    • On the MoE versions of these models the MTP versions have only marginal benefit. In my trials the speed-up is <20% (not the ~2x that happens with some other setup/models) and usually more like 10%. Ie. something like 13 -> 15 token/s... on my device.

      I still use the MTP version as it _feels_ slightly better quality, and because the unsloth quantizations I can get have more variety to fit into the various systems at hand... but that's not for the MTP aspect, unfortunately.

      In the article they did have ~2x performance on the 27B (which might be something to retry, though on my Framework that would bring it from 5 -> 10 token/s so still "excrutiating" speed, probably).

      YMMV for sure.