Comment by ramshanker
14 days ago
I have a gut feeling, next in line will be 2 or more level of MoE. Further reducing the memory bandwidth and compute requirements. So top level MoE router decides which sub MoE to route.
14 days ago
I have a gut feeling, next in line will be 2 or more level of MoE. Further reducing the memory bandwidth and compute requirements. So top level MoE router decides which sub MoE to route.
The solution to all problems in computer science is add a new level of indirection (or abstraction).
Except when the solution is to collapse abstraction in the name of efficiency.