Comment by lostmsu
2 years ago
I think this is a case of a person with a hammer seeing everything as nails. Attention is no more kernel mechanism than a form of matrix decomposition or even a bilinear form. It is similar but not quite the same to all of these things.
Is attention more than "hyperplane flatness"?