Comment by MoonGhost

10 months ago

I think the problem is with positional encoding. If model cannot clearly separate tokens in context window they overlap which leads to mess. That encoding matters and actual position does not.

0 comments