Comment by maxbond
7 months ago
That's a super interesting hypothesis. From an information theory perspective, rarer tokens are more informative. Maybe this results in the caps lock tokens being weighted higher by the attention mechanism.
7 months ago
That's a super interesting hypothesis. From an information theory perspective, rarer tokens are more informative. Maybe this results in the caps lock tokens being weighted higher by the attention mechanism.
No comments yet
Contribute on Hacker News ↗