← Back to context

Comment by hedgehog

3 days ago

I don't have links handy but there is active research in this area.

I'd love any keywords to search for to find active research on this topic!

  • Most of the work I'm aware of starts from the perspective of optimizing inference but the implication that pushing the lessons upstream gets mentioned here and there.

    Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models (https://arxiv.org/abs/2505.16056)

    Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression (https://arxiv.org/abs/2510.02345)