Comment by viccis

6 hours ago

Vector spaces and bag of words models are not specifically related to LLMs, so I think that's irrelevant to this topic. It's not about "knowledge", just the ability to represent words in such a way that similarities between them take on useful computational characteristics.

Well, pretty much all of the LLMs are based on the decode-only version of the Transformer architecture (in fact it’s the T in GPT).

And in the Transformer architecture you’re working with embeddings, which are exactly what this article is about, the vector representation of words.