Comment by root_axis
1 day ago
Funny enough, Anthropic just went GA with 1m context claude that has supposedly solved the lost-in-the-middle problem.
1 day ago
Funny enough, Anthropic just went GA with 1m context claude that has supposedly solved the lost-in-the-middle problem.
Just for anyone else who hadn't seen the announcement yet, this Anthropic 1M context is now the same price as the previous 256K context - not the beta where Anthropic charged extra for the 1M window:
https://x.com/claudeai/status/2032509548297343196
As for retrieval, the post shows Opus 4.6 at 78.3% needle retrieval success in 1M window (compared with 91.9% in 256K), and Sonnet 4.6 at 65.1% needle retrieval in 1M (compared with 90.6% in 256K).
Aren't these numbers really bad? > 80% needle retrieval means every fifth memory is akin to a hallucination.
I don't think it quite means that - happy to be corrected on this, but I think it's more like what percentage it can still pay attention to. If you only remembered "cat sat mat", that's only 50% of the phrase "the cat sat on the mat", but you've still paid attention to enough of the right things to be able to fully understand and reconstruct the original. 100% would be akin to memorizing & being able to recite in order every single word that someone said during their conversation with you.
But even if I've misunderstood how attention works, the numbers are relative. GPT 5.4 at 1M only achieves 36% needle retrieval. Gemini 3.1 & GPT 5.4 are only getting 80% at even the 128K point, but I think people would still say those models are highly useful.
1 reply →
now that's major news
In addition to context rot, cost matters, I think lots of people use toke compression tools for that not because of context rot
From a determinism standpoint it might be better for the rot to occur at ingest rather than arbitrarily five questions later.
[dead]