Comment by bob1029
5 days ago
Diffusion models need to infer the causality of language from within a symmetric architecture (information can flow forward or backward). AR forces information to flow in a single direction and is substantially easier to control as a result. The 2nd sentence in a paragraph of English text often cannot come before the first or the statement wouldn't make sense. Sometimes this is not an issue (and I think these are cases where parallel generation makes sense), but the edge cases are where all the money lives.
But I do wonder if diffusion models will be used in more complex Software Architecture for their long-term coherence, no exposure bias, and their symmetric architecture could work well with interaction nets.