Comment by papichulo2023

2 years ago

Not an authority in the matter, but afaik, with position encodings (part of the Transformers architecture), they can handle dimensionality just fine. Actually some people tried to do 2D Transformers and the results were the same.

Visual transformers are gaining traction and they are 100% focus in 2d data.