Comment by sametoymak
3 years ago
Here is what I meant:
Standard SVM classifier: Maps an input sequence to a 0-1 label. Example: Take a paragraph and return its sentiment. During training, label is specified.
Transformer's SVM: Takes input sequence, suppresses bad tokens and passes good tokens to the next layer. This is a token-selector rather than classifier.
Example: Take a paragraph and output the salient words in the paragraph. We don't know which words are salient during training, the model has to figure them out during training.
No comments yet
Contribute on Hacker News ↗