Comment by therealpygon
10 hours ago
Context quite literally degrades performance of attention with size in non-needle-in-haystack lookups in almost every model to varying degrees. Thus to answer the question, the “waste” is making the model dumber unnecessarily in an attempt to make it smarter.
No comments yet
Contribute on Hacker News ↗