Comment by immibis
1 year ago
An RNN doesn't run out of memory from that, so they are still fundamentally different.
How do you encode arbitrarily long positions, anyway?
1 year ago
An RNN doesn't run out of memory from that, so they are still fundamentally different.
How do you encode arbitrarily long positions, anyway?
They are different but transformers don't have fixed windows, you can extend the context or make it smaller. I think you can extend a positional encoding if it's not a learned encoding.