Comment by Mars008
6 days ago
Something I don't understand. Wasn't attention with query/key supposed to filter out irrelevant tokens?
2. This CatsAttack has many applications. For example, it probably can confuse safety and spam filters. Can be tried on image generators...
Attention weights can still assign non-zero probability to irrelevant tokens since the mechanism optimizes for prediction rather than semantic relevance, and these irrelevant tokens can create interference in the hidden state representations.