Comment by organsnyder

1 day ago

I run Qwen 3.6 on my Framework Desktop 128GB, and it's very performant. I know Framework has had to raise the price since I preordered mine, but they're still well under half the cost of that Macbook.

6 comments

organsnyder

SomeHacker44 7 hours ago

Can you please explain how you set it up? I run it on my 129G Strix Halo under Arch with Lemonade with OpenCode and it just sits there doing barely anything unless I leave it to run over night. Then it says it thought for 13.7 seconds but was really 15 minutes. Thanks! I am using the 27B dense MTP model quantized by UnSloth with the UD-Q8_K_L if memory serves.

andy99 1 day ago

I get ~55 Tok/s on my framework desktop with the 35B A3B q8 model, and so far am also very happy with the coding performance.

cyanydeez 1 day ago
did you upgrade to MTP?
- imrehg 12 hours ago
  
  On the MoE versions of these models the MTP versions have only marginal benefit. In my trials the speed-up is <20% (not the ~2x that happens with some other setup/models) and usually more like 10%. Ie. something like 13 -> 15 token/s... on my device.
  I still use the MTP version as it _feels_ slightly better quality, and because the unsloth quantizations I can get have more variety to fit into the various systems at hand... but that's not for the MTP aspect, unfortunately.
  In the article they did have ~2x performance on the 27B (which might be something to retry, though on my Framework that would bring it from 5 -> 10 token/s so still "excrutiating" speed, probably).
  YMMV for sure.
- andy99 4 hours ago
  
  That was with the MTP version

bityard 1 day ago

There are several variants of Qwen 3.6, the MoE models are performant on Strix Halo, but the 27B dense model (the one spoken about in TFA, and generally regarded as the best of the group in terms of quality) is not so performant: https://kyuz0.github.io/amd-strix-halo-toolboxes/