← Back to context

Comment by valine

2 years ago

Or alternatively a lot of energy is wasted answering simple questions.

The whole point of the transformer is to take words and iteratively, layer by layer, use the context to refine their meaning. The vector you get out is a better representation of the true meaning of the token. I’d argue that’s loosely akin to ‘understanding’.

The fact that the transformer architecture can memorize text is far more surprising to me than the idea that it might understand tokens.