Comment by elif

1 day ago

It's the same thing. Quantize your parameters? "Bigger" model runs faster. MOE base model distillation? "Bigger" model runs as smaller model.

There is no gain for anyone anywhere by reducing parameter count overall if that's what you mean. That sounds more like you don't like transformer models than a real performance desire