Comment by fulafel
4 hours ago
Looks like DeepSeek did this as well since V3: https://deepwiki.com/deepseek-ai/DeepSeek-V3/4.4-multi-token...
Credit for the MTP technique is due to https://arxiv.org/abs/2404.19737 from 2024:
Better & Faster Large Language Models via Multi-token Prediction Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve
No comments yet
Contribute on Hacker News ↗