Comment by Alifatisk
1 day ago
Alibaba keeps releasing gold content
I just tried Qwen3-Next-80B-A3B on Qwen chat, and it's fast! The quality seem to match Qwen3-235B-A22B. Quite impressive how they achieved this. Can't wait for the benchmarks at Artificial analysis
According to Qwen Chat, Qwen3-Next has the following limits:
Maximum context length: 262,144 tokens
Max summary generation length: 32,768 tokens
This is 2x higher on context length and 4x higher on summary generation compared to Qwen3-235B-A22B, damn
> Qwen3-Next [...] excels in ultra-long-context understanding and complex tasks
Even though their new hybrid architecture is fascinating, I think I'll continue to stick with Qwen2.5-Turbo because it's one of the few models that supports 1M tokens in context length. My use case is uploading large pdfs and ask questions across chapters
My take on long context for many frontier models is not about support but the accuracy drops drastically as you increase the context. Even if a model claims to support 10M context, reality is it doesn’t perform well when you saturate. Curious to hear others perspective on this
This is my experience with Gemini. Yes, I really can put an entire codebase and all the docs and pre-dev discussions and all the inter-engineer chat logs in there.
I still see the model becoming more intoxicated as turn count gets high.
I use repomix to pack a full repository as an xml file and it works wonders. System prompt is very simple:
please don't add any comments in the code unless explicitly asked to, including the ones that state what you changed. do not modify/remove any existing comments as long as they are valid. also output the full files that are changed (not the untouched ones), and no placeholders like "no change here" etc. do not output the xml parts in the output.xml file. focus on the individual files. before and after outputting code, write which file it would be and the path (not as a comment in the code but instead, before and after outputting code).
Attached is a 400k token xml file, being the output of:
https://pastebin.com/raw/SH6JHteg
Main prompt is a general description of the feature needed and PDF exports from figma.
All done for free in aistudio and I consistently get better results than the people using claude code.
Agreed. That said, in general a 1M context model has a larger usable window than a 260k context model.
If you read the model card, Qwen3-Next can be extended to 1M context length with YaRN.
> Qwen3-Next natively supports context lengths of up to 262,144 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 1 million tokens using the YaRN method.
Source: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct#proc...
> If you read the model card, Qwen3-Next can be extended to 1M context length with YaRN.
I read the article, but as I said Qwen chat only provides up to 262k tokens in context length, so I'll stick with Qwen2.5 Turbo which supports 1M tokens.
I am not in a position where I can self-host yet
Their proprietary models are very good too and go under the radar, they never seem to appear on any benchmarks. Qwen3-coder-plus is significantly better than their open source qwen3, Qwen3 max also rivals the SOTA models
Are these actually different models vs just different names from the open weights releases?
They generally match, except I don't think the Max ones have releases
How are you prepping the PDF data before shoving it into Qwen?
I just compress the file size as low as possible without losing the quality, didn't even know there was more ways to prep it.
I do sometimes chop up the PDF into smaller pdfs with their own individual chapters
On Linux you can use pdftotext also if you are only concerned with the text.
Not OP, but we use the docling library to extract text and put it in markdown before storing for use with an LLM.