Comment by crubier

3 months ago

I can believe that your abstract thoughts in latent space are diffusing/forming progressively when you are thinking.

But I can't believe the actual literal words are diffusing when you're thinking.

When being asked: "How are you today", there is no way that your thoughts are literally like "Alpha zulu banana" => "I banana coco" => "I banana good" => "I am good". The diffusion does not happen at the output token layer, it happens much earlier at a higher level of abstraction.

Or like this:

"I ____ ______ ______ ______ and _____ _____ ______ ____ the ____ _____ _____ _____."

If the images in the article are to be considered an accurate representation, the model is putting meaningless bits of connective tissue way before the actual ideas. Maybe it's not working like that. But the "token-at-a-time" model is also obviously not literally looking at only one word at a time either.