← Back to context

Comment by sleepyeldrazi

6 hours ago

Have you tested Qwen3.6 35B? Putting aside the capability claims for that model (which I support, but are not my point here), that 35B has smaller active parameter count than the gemma 4 26B, potentially making both prefill and decode faster out of the box, and has MTP heads built in the model and well supported (you may need to make sure you download a quant that didn't strip them off, as some do to preserve space). I would be curious to see your numbers there too. And if you do test this, please go for a clean one and not a fine-tuned one.