Comment by mchinen
4 hours ago
Ah yeah, thinking further it's probably just using some positioning embedding based on sequence numbering added in the LLM layers. For vision it needs the patch location as well.
4 hours ago
Ah yeah, thinking further it's probably just using some positioning embedding based on sequence numbering added in the LLM layers. For vision it needs the patch location as well.
No comments yet
Contribute on Hacker News ↗