← Back to context

Comment by 3vidence

2 months ago

Not an expert in this space.

Aren't tokens transformed with position dependent information in most models?

I believe llama applies a rotation to the vector based on the position in the input.

That's true in the realm of LLMs. But even in this case, the position information is added only into the first layer. Tokens in later layers can choose to "forget" this information. In addition there are applications of transformers in other domains. See https://github.com/cvg/LightGlue or https://facebookresearch.github.io/3detr/