Comment by curiousfiddler
1 day ago
So, why would this extract more semantic meaning than multi-head attention? Isn't the whole point of multiple heads similar to how CNNs use multiple types of filters to extract different semantic relationships?
1 day ago
So, why would this extract more semantic meaning than multi-head attention? Isn't the whole point of multiple heads similar to how CNNs use multiple types of filters to extract different semantic relationships?
No comments yet
Contribute on Hacker News ↗