Comment by benob

2 months ago

I guess autoregressive llms can be finetuned (or continual-pretrained) to do inference using diffusion. We've seen a recent paper (which I don't remember) training from scratch, but it seems overkill. Do Google say how they did it?

Also, does diffusion have the potential to increase speed of cpu-only inference?