Comment by killerstorm

2 days ago

Perhaps some radical MoE where you download _exactly_ the components you need as you need them. Currently MoE is switched usually on per-token per-layer basis, so you need all weights locally. But e.g. Apple made one which pre-selects all experts based on prompt embedding. That might be further scaled up - e.g. predict exactly what's needed

2 comments

killerstorm

eblanshey 1 day ago

I don't understand why no labs create dedicated models per industry/expert. E.g. physics, electronics, chemistry, etc. Each model would be much smaller and better suitable for running locally. Everyone is trying to cram everything into a single model.

salter2 2 days ago

Perhaps something similar to speculative decoding.

Speculating Experts Accelerates Inference for Mixture-of-Experts: https://arxiv.org/abs/2603.19289