Comment by tsimionescu

2 years ago

It works by computing that P(queen|"king-man+woman", corpus) > P(<other-words>|"king-man+woman", corpus), i.e. it predicts that the most likely next token after that phrase, based on the entire training corpus and the loss function, is "queen".

Now, how exactly the LLM is computing this prediction remains poorly understood.

word2vec is pretty much completely understood. there are maybe like some bounds that could maybe pushed lower but we know exactly how it works.