Comment by hedgehog

1 day ago

Most of the work I'm aware of starts from the perspective of optimizing inference but the implication that pushing the lessons upstream gets mentioned here and there.

Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models (https://arxiv.org/abs/2505.16056)

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression (https://arxiv.org/abs/2510.02345)