Comment by DiabloD3
3 days ago
MTP models share internal state with the main model, and also refer to parameters in the model.
They are more like a single model that has two separate attention head mechanisms.
3 days ago
MTP models share internal state with the main model, and also refer to parameters in the model.
They are more like a single model that has two separate attention head mechanisms.
No comments yet
Contribute on Hacker News ↗