Comment by rishabhaiover
5 hours ago
After a certain amount of context usage, I think I empirically see the stated issues with Top-K compression strategy. It doesn't catastrophically forget but nuances fade as I reach towards the tail end of my context limits.
Yeah, that’s consistent. topK keeps the obvious tokens, but subtle context gets eroded over time rather than dropped all at once.