Comment by RandyRanderson
11 hours ago
Why is it surprising that, at some point, more information will lead to worse performance?
It seems obvious. Moreover, in a simple model, it seems like whatever tokens you do add have to have MORE information than the average in the existing window.
In a non-trivial model (and this is the model I would choose), since you are adding them to the end, they likely have to have MUCH more information.
Proof as always is an exercise to the reader.
No comments yet
Contribute on Hacker News ↗