← Back to context

Comment by Reubend

2 days ago

After trying to understand their method, I think you're right. Doesn't seem like anything that I would personally call "diffusion". Much closer to MTP + speculative decoding.

Then again, their results with it are great. It would be interesting to benchmark it against standard SD on a model that already uses MTP.

Yeah, I think it's a super neat way to do MTP. Conceptually much more pleasing and simple than existing methods. Especially since this way scaling `k` as models get better will be easier. Wish it had been presented as such.