Comment by qoez
3 days ago
It's basically a mixture of experts but instead of a learned operator picking the predicted best model, you use a 'max' operator across all experts.
3 days ago
It's basically a mixture of experts but instead of a learned operator picking the predicted best model, you use a 'max' operator across all experts.
No comments yet
Contribute on Hacker News ↗