Comment by OneDeuxTriSeiGo
20 hours ago
As far as I can tell MTP is unique from regular speculative decode because the small model is trained to consume and operate on the big model's hidden state for prediction.
20 hours ago
As far as I can tell MTP is unique from regular speculative decode because the small model is trained to consume and operate on the big model's hidden state for prediction.
No comments yet
Contribute on Hacker News ↗