Comment by redcobra762
2 years ago
There's got to be a probability cut-off, though. LLMs don't infinitely connect every token with every other token, some aren't connected at all, even if some association is taught, right?
2 years ago
There's got to be a probability cut-off, though. LLMs don't infinitely connect every token with every other token, some aren't connected at all, even if some association is taught, right?
The weights have finite precision which means they represent value-ranges / have error bars. So even if the weight is exactly 0 it does not represent complete confidence in it never occurring.
A weight necessitates a relationship, but I’m arguing LLMs don’t create all relationships. So a connection wouldn’t even exist.
When relationships are represented implicitly by the magnitude of the dot product between two vectors, there's no particular advantage to not "creating" all relationships (i.e. enforcing orthogonality for "uncreated" relationships).
On the contrary, by allowing vectors for unrelated concepts to be only almost orthogonal, it's possible to represent a much larger number of unrelated concepts. https://terrytao.wordpress.com/2013/07/18/a-cheap-version-of...
In machine learning, this phenomenon is known as polysemanticity or superposition https://transformer-circuits.pub/2022/toy_model/index.html
1 reply →