Comment by qoez
7 months ago
It's basically a mixture of experts but instead of a learned operator picking the predicted best model, you use a 'max' operator across all experts.
7 months ago
It's basically a mixture of experts but instead of a learned operator picking the predicted best model, you use a 'max' operator across all experts.
No comments yet
Contribute on Hacker News ↗