Comment by lordswork

5 months ago

MOE as an idea specific to neural networks has been around since 1991[1] . OP is probably aware, but adding for others following along, while MoE has roots in ensembling, there are some important differences: Traditional ensembles run all models in parallel and combine their outputs, whereas MoE uses a gating mechanism to activate only a subset of experts per input. This enables efficient scaling via conditional computation and expert specialization, rather than redundancy.

[1]:https://ieeexplore.ieee.org/document/6797059

0 comments

lordswork

No comments yet

Contribute on Hacker News ↗