Comment by mirekrusin

10 hours ago

If “speculative” approach works so well in different contexts why not make it first class and use everywhere, possibly recursively?

3 comments

mirekrusin

saagarjha 7 hours ago

Speculation is only worth it if you can profit from it. Not every context allows this or has a similar idea of what can be speculated.

mirekrusin 3 hours ago

It works very well on dense models, imho great alternative to MoE. As verification is cheaper than generation it could be fundamental, first class primitive, maybe even to recurse on it, do live distillation during inference etc.
MoE is more hardcoded, pre determined, speculation is much more dynamic, malleable after training.
This paper actually proposes direction of aligning architecture to aid speculation as future work.

doctorpangloss 1 hour ago

Multi-token prediction is a good enhancement to training. It isn't necessarily useful for inference. Other speculative decoding like EAGLE is. It is specific to the technology and the authors of these things write about it.