Comment by hansonw
1 year ago
This is also a good paper on the subject:
What Algorithms can Transformers Learn? A Study in Length Generalization https://arxiv.org/abs/2310.16028
1 year ago
This is also a good paper on the subject:
What Algorithms can Transformers Learn? A Study in Length Generalization https://arxiv.org/abs/2310.16028
Yes this is a good empirical study on the types of tasks that's been shown to be impossible for transformers to generalise on.
With both empirical and theoretical support I find it's pretty clear this is an obvious limitation.