← Back to context

Comment by mirekrusin

10 hours ago

If “speculative” approach works so well in different contexts why not make it first class and use everywhere, possibly recursively?

Speculation is only worth it if you can profit from it. Not every context allows this or has a similar idea of what can be speculated.

  • It works very well on dense models, imho great alternative to MoE. As verification is cheaper than generation it could be fundamental, first class primitive, maybe even to recurse on it, do live distillation during inference etc.

    MoE is more hardcoded, pre determined, speculation is much more dynamic, malleable after training.

    This paper actually proposes direction of aligning architecture to aid speculation as future work.

Multi-token prediction is a good enhancement to training. It isn't necessarily useful for inference. Other speculative decoding like EAGLE is. It is specific to the technology and the authors of these things write about it.