Comment by himata4113
19 hours ago
well 8096 is just the first number that came to my mind, obviously frontier models have 32k or above, but they essentially they have a layer which "looks" at a limited view of the entire context window. {[1m x 3-4 weights] attention layer to determine what is actually important} -> {all other layers}
No comments yet
Contribute on Hacker News ↗