Comment by brookst
4 days ago
No, the vector is in a semantic embedding space. That's the magic.
So "the sky is blue" converts to the tokens [1820, 13180, 374, 6437]
And "le ciel est bleu" converts to the tokens [273, 12088, 301, 1826, 12704, 84]
Then the embeddings vectors created from these are very similar, despite the letters having very little in common.
Character on 1st/2nd/3rd place is part of semantic space in generic meaning of the word. I ran experiments which seemingly ~support my hypothesis below.