Comment by aesthesia
1 hour ago
Audio is 1 dimensional so the usual RoPE position encoding should handle it like it does for text tokens. You only need extra position encoding for higher-dimensional stuff like images.
1 hour ago
Audio is 1 dimensional so the usual RoPE position encoding should handle it like it does for text tokens. You only need extra position encoding for higher-dimensional stuff like images.
No comments yet
Contribute on Hacker News ↗