Comment by shawntan
2 months ago
I'm curious how the speed is achieved is this is the technique used. Generally I expected this "masked language model" technique to be far slower since the full vocab projection needs to be computed every iteration.
I always thought the eventual technique would be some form of diffusion in continuous space, then decoding into the discrete tokens.
Also I'm guessing this is a "best guess" of how Gemini Diffusion is done?
No comments yet
Contribute on Hacker News ↗