Comment by DiabloD3

1 day ago

The recommended values for Qwen 3.6 in thinking mode is `--temp 1.0 --top-p 0.95 --top-k 20 --min-p 0.00`, and `--temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00` for coding/tool calling tasks, and for non-thinking, `--temp 0.7 -top-p 0.8 --top-k 20 --presence-penalty 1.5 --min-p 0.00`.

The options listed are none of these.

Also, the recommended Qwen MTP settings are `--spec-type draft-mtp --spec-draft-n-max 2`. 3 is not good on Nvidia hardware under different workloads. You can also add `ngram-mod`, but after `draft-mtp`; however, default `ngram-mod` settings aren't well tuned, and you want `--spec-ngram-mod-n-min 12 --spec-ngram-mod-n-max 16 --spec-ngram-mod-n-match 6` (defaults are 48, 64, 24; the ratio is good, the magnitude is suboptimal).

Of abliterated Qwen 3.6 27B models, huihui's ends up being the worst. Try heretic instead. https://huggingface.co/mradermacher/Qwen3.6-27B-uncensored-h...

> You can also add `ngram-mod`, but after `draft-mtp`

It looks like there's a hardcoded preference, CLI order is not important.

(speculative.cpp:1322-1381): common_get_enabled_speculative_configs converts the types vector to a bitmask (order-independent). Then configs are added in a hardcoded priority order:

ngram-simple

ngram-map-k

ngram-map-k4v

ngram-mod

ngram-cache

draft-simple

draft-eagle3

draft-mtp

(speculative.cpp:1557-1603): common_speculative_draft iterates impls in the hardcoded priority order. Once an impl produces a draft for a sequence, later impls skip that sequence.