Qwen3.6-27B supports a 1 million token context window.
Of course, you have to have the right hardware to be able to run with a context window like that, as it takes about 100GB of memory on my DGX Spark to do that with full f16 KV cache on the q4_k_xl model.
Qwen3.6-27B supports a 1 million token context window.
Of course, you have to have the right hardware to be able to run with a context window like that, as it takes about 100GB of memory on my DGX Spark to do that with full f16 KV cache on the q4_k_xl model.
Got a similar result (my RTX 4070 only has 12 GB). I'm curious about whether 24/32 GB meaningfully improves this enough to make it useful.
Try it on RAM and CPU.
It’s slower but you can run them.
Good idea for evaluating the models, thanks.
Prompt more directly instead of open ended.