← Back to context

Comment by yorwba

8 hours ago

The intrinsic limitation of text diffusion is that natural text contains serial dependencies where a word at the beginning of the text strongly influences what comes later, and if there is a long enough dependency chain within a diffusion block, the small number of diffusion steps may not be enough to resolve all dependencies, so that you end up with incoherent output.

The obvious solution is to simply do more steps for larger sequences though, right?

How exactly does this work with CoT?