Comment by zozbot234
6 hours ago
I assume these are just output layers that are trained on the hidden state from the larger model - that's how MTP works. It's not a separate drafting model.
6 hours ago
I assume these are just output layers that are trained on the hidden state from the larger model - that's how MTP works. It's not a separate drafting model.
No comments yet
Contribute on Hacker News ↗