Comment by OneDeuxTriSeiGo
18 hours ago
As far as I can tell MTP is unique from regular speculative decode because the small model is trained to consume and operate on the big model's hidden state for prediction.
18 hours ago
As far as I can tell MTP is unique from regular speculative decode because the small model is trained to consume and operate on the big model's hidden state for prediction.
No comments yet
Contribute on Hacker News ↗