Comment by plipt
4 days ago
I dont think the Flash model discussed in the article is 30B
Their benchmark table shows it beating Qwen3-235B-A22B
Does "Flash" in the name of a Qwen model indicate a model-as-a-service and not open weights?
4 days ago
I dont think the Flash model discussed in the article is 30B
Their benchmark table shows it beating Qwen3-235B-A22B
Does "Flash" in the name of a Qwen model indicate a model-as-a-service and not open weights?
Flash is a closed weight version of https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct (it is 30B but with addtional training on top of the open weight release). They deploy the flash version on Qwen's own chat.
Thanks
Was it being closed weight obvious to you from the article? Trying to understand why I was confused. Had not seen the "Flash" designation before
Also 30B models can beat a semi-recent 235B with just some additional training?
They had a Flash variant released alongside the original open weight release. It is also mentioned in Section 5 of the paper: https://arxiv.org/pdf/2509.17765
For the evals it's probably just trained on a lot of the benchmark adjacent datasets compared to the 235B model. Similar thing happened on other model today: https://x.com/NousResearch/status/1998536543565127968 (a 30B model trained specifically to do well in maths get near SOTA scores)