Comment by danielhanchen
7 days ago
Unsure but yes most likely they use YaRN, and maybe trained a bit more on long context maybe (or not)
7 days ago
Unsure but yes most likely they use YaRN, and maybe trained a bit more on long context maybe (or not)
No comments yet
Contribute on Hacker News ↗