← Back to context

Comment by fulafel

3 hours ago

Multi-token prediction is a refined form of speculative decoding.

Researchers at Google came up with Speculative decoding in 2022: https://research.google/blog/looking-back-at-speculative-dec... (Fast Inference from Transformers via Speculative Decoding - Yaniv Leviathan, Matan Kalman, Yossi Matias)

Researchers at Meta came up with MTP, a smarter way of doing speculative decoding in 2024: https://arxiv.org/abs/2404.19737 (Better & Faster Large Language Models via Multi-token Prediction Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve)

DeepSeek V3 shipped MTP in a product first, in 2024: https://arxiv.org/abs/2412.19437 (DeepSeek-V3 Technical Report, 100+ authors)