← Back to context

Comment by mirekrusin

5 hours ago

It works very well on dense models, imho great alternative to MoE. As verification is cheaper than generation it could be fundamental, first class primitive, maybe even to recurse on it, do live distillation during inference etc.

MoE is more hardcoded, pre determined, speculation is much more dynamic, malleable after training.

This paper actually proposes direction of aligning architecture to aid speculation as future work.