Comment by littlestymaar
14 days ago
I read somewhere that it has been trained on 256k tokens, and then expanded with RoPE on top of that, not starting from 16k like everyone does IIRC so even if it isn't really flawless at 10M, I'd expect it to be much stronger than its competitors up to those 256k.
No comments yet
Contribute on Hacker News ↗