Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by himata4113

17 hours ago

well 8096 is just the first number that came to my mind, obviously frontier models have 32k or above, but they essentially they have a layer which "looks" at a limited view of the entire context window. {[1m x 3-4 weights] attention layer to determine what is actually important} -> {all other layers}

0 comments

himata4113

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities