Comment by danielhanchen
3 days ago
Unsure but yes most likely they use YaRN, and maybe trained a bit more on long context maybe (or not)
3 days ago
Unsure but yes most likely they use YaRN, and maybe trained a bit more on long context maybe (or not)
No comments yet
Contribute on Hacker News ↗