Comment by topwalktown

8 months ago

Transformers like Llama use rotary embeddings which are applied in every single attention layer

1 comment

topwalktown

Very interesting! Do you know if there were any studies about whether this improves performance?