Comment by smallerize
9 hours ago
I think it means most of the training data is short. And a lot of the long-context examples are conversations where the middle turns are less important.
9 hours ago
I think it means most of the training data is short. And a lot of the long-context examples are conversations where the middle turns are less important.
No comments yet
Contribute on Hacker News ↗