← Back to context

Comment by ftxbro

3 years ago

> It solves an SVM that separates 'good' tokens within each input sequence from 'bad' tokens. This SVM serves as a good-token-selector and is inherently different from the traditional SVM which assigns a 0-1 label to inputs.

sorry but how is separating 'good' tokens from 'bad' tokens inherently different from assigning a 0-1 label

Here is what I meant:

Standard SVM classifier: Maps an input sequence to a 0-1 label. Example: Take a paragraph and return its sentiment. During training, label is specified.

Transformer's SVM: Takes input sequence, suppresses bad tokens and passes good tokens to the next layer. This is a token-selector rather than classifier.

Example: Take a paragraph and output the salient words in the paragraph. We don't know which words are salient during training, the model has to figure them out during training.