Comment by jimmyl02
15 days ago
the large context windows generally involve RoPE[0] which is a trick that allows the training window to be smaller but expand larger during inference. it seems like they have a new "iRoPE" which might have better performance?
No comments yet
Contribute on Hacker News ↗