No, Mixture of Experts is a really confusing term.
It sounds like it means "have a bunch of models, one that's an expert in physics, one that's an expert in health etc and then pick the one that's a best fit for the user's query".
It's not that. The "experts" are each another giant opaque blob of weights. The model is trained to select one of those blobs, but they don't have any form of human-understandable "expertise". It's an optimization that lets you avoid using ALL of the weights for every run through the model, which helps with performance.
No, Mixture of Experts is a really confusing term.
It sounds like it means "have a bunch of models, one that's an expert in physics, one that's an expert in health etc and then pick the one that's a best fit for the user's query".
It's not that. The "experts" are each another giant opaque blob of weights. The model is trained to select one of those blobs, but they don't have any form of human-understandable "expertise". It's an optimization that lets you avoid using ALL of the weights for every run through the model, which helps with performance.
https://huggingface.co/blog/moe#what-is-a-mixture-of-experts... is a decent explanation.