← Back to context

Comment by WithinReason

1 year ago

no, you can give as much context to a transformer as you want, you just run out of memory

An RNN doesn't run out of memory from that, so they are still fundamentally different.

How do you encode arbitrarily long positions, anyway?

  • They are different but transformers don't have fixed windows, you can extend the context or make it smaller. I think you can extend a positional encoding if it's not a learned encoding.