Comment by PaulHoule
1 day ago
You're right that "systems competent in language" (either humans or LLMs) are able to accept and understand slightly wrong sequences but generate correct sequences almost all of the time. (Who hasn't made a typo when talking to a chatbot and had the chatbot ignore the type and respond correctly?)
Treating G(t) as a binary function works for linguists who need a paradigm to do "normal science" but Chomsky's theory has not been so useful for building linguistically competent machines so there have to be serious things wrong with that theory.
Still, the vast majority of sequences t are gibberish that is nowhere near being valid. If those gibberish sequences are representable in the embedding space and took up a volume anywhere near the numeric prevalance they have I can only imagine that in a (say) N=3000 embedding space there is something like a manifold that is N=2999 or N=2998 or N=1500 or something inside the flat embedding space -- that structure would be the non-flat embedding you're looking for or be an approximation to it.
It might be that it is not really a manifold or has different dimensionalities in different places or even fractional dimensionalites. For instance you'd hope that would geometrically represent semantics of various sorts as suggested by the graphs here
https://nlp.stanford.edu/projects/glove/ [1]
So I've thought a lot about sympletic spaces in higher dimensions where area has to be conserved over various transformations (the propagator) and maybe this has led me to think about it the totally wrong way -- maybe the flat embedding space doesn't devote a large volume to gibberish because it was never trained to model gibberish strings, which has to have interesting implications if that is true.
Something else I think of is John Wheeler's idea of superspace in quantum gravity where, even though space-time looks like a smooth manifold to us, the correct representation in the quantum theory might be discrete: maybe for points a, b there are the possibilities that: (1) a,b are the same point, (2) a is the future or b, (3) b is the future of a, or (4) a and b are not causally connected. So you have this thing which exists on one level as something basically symbolic but looks like a manifold if you live in it and you're much bigger than the Planck length.
But to get to that answer of "why do we flatten it?", we're not flattening it deliberately, the "flattening" is done by the neural network and we don't know another way to do it.
[1] ... which I don't really believe, of course you can project out 20 points out of a 50 dimensional embedding into an N=2 space and have the points land wherever you want!
No comments yet
Contribute on Hacker News ↗