Comment by famouswaffles
1 year ago
For a transformer, context is already always being repeated every token. They can fetch information that became useful anytime they want. I don't see what problem there is to solve here.
1 year ago
For a transformer, context is already always being repeated every token. They can fetch information that became useful anytime they want. I don't see what problem there is to solve here.
For a transformer, context is limited, so the same kind of problem applies after you exceed some size.