Comment by brookst

4 days ago

No, the vector is in a semantic embedding space. That's the magic.

So "the sky is blue" converts to the tokens [1820, 13180, 374, 6437]

And "le ciel est bleu" converts to the tokens [273, 12088, 301, 1826, 12704, 84]

Then the embeddings vectors created from these are very similar, despite the letters having very little in common.

Character on 1st/2nd/3rd place is part of semantic space in generic meaning of the word. I ran experiments which seemingly ~support my hypothesis below.