← Back to context

Comment by chaorace

14 days ago

The "Experts" in MoE is less like a panel of doctors and more like having different brain regions with interlinked yet specialized functions.

The models get trained largely the same way as non-MoE models, except with specific parts of the model silo'd apart past a certain layer. The shared part of the model, prior to the splitting, is the "router". The router learns how to route as an AI would, so it's basically a black-box in terms of whatever internal structure emerges from this.