← Back to context

Comment by deskamess

19 hours ago

Did DeepSeek come up with MTP? It was listed prominently in their recent paper as being carried forward from the previous release.

i think this is mixing two separate ideas. MTP is the training-side piece. speculative decoding is the inference trick. DeepSeek V3 used MTP as an auxiliary loss. the 2022 Google paper is speculative decoding. now Google is combining them. https://arxiv.org/abs/2404.19737

  • Oh... so MTP is not speculative decoding? The (T)oken (P)rediction made me think it was on the inference side. I shall read the paper.

    Edit: Ok, I understand now. You are saying that MTP has two aspects. 1) The training (for the mini-models to generate tokens), and 2) The actual speculative decoding implementation on the inference side (which uses those trained mini-models).