← Back to context

Comment by blagui

7 hours ago

So the sweet spot for dev in 2026 is 64k context windows? Are we back in 2024?

As more context will degrade a lot the t/s. On top this is 1 slot.

If you use sub agents the kv cache will be invalidated with colliding request and make it even slower.

So the in real world 256k (the max qwen offer) and using 3-4 slots the numbers are very different.

This is the major issue with so many postes over local models not benchmarking real world use. Real context and not taking this in context.

If you use 1 slot the issue, you loose the ability of using sub agents when exploring and all end up in the main agent context overloading it, triggering compactation and oh boy with 64k context that compecation will be an endless loop.

What tasks you would really be able to do with 64k context 1 agent? For sure so quick edits but not complex planning where you need to ingest a lot files and end up loosing 80% of the ingested files to compactation.