← Back to context

Comment by regularfry

12 hours ago

This is a different model with, confusingly, approximately the same number of params as the existing gemma4 MoE. Unclear from a quick scan whether one was trained somehow from the other.

The mechanism isn't the same as speculative decoding. Speculative decoding happens sequentially and (usually) a couple of tokens at a time; diffusion doesn't, and does blocks of text at once. I haven't read the collateral yet but my assumption would be that it's trained to keep the specific experts stable across a diffusion block.