← Back to context

Comment by bjt12345

7 days ago

I do wonder why diffusion models aren't used alongside constraint decoding for programming - surely it makes better sense then using an auto-regressive model.

Diffusion models need to infer the causality of language from within a symmetric architecture (information can flow forward or backward). AR forces information to flow in a single direction and is substantially easier to control as a result. The 2nd sentence in a paragraph of English text often cannot come before the first or the statement wouldn't make sense. Sometimes this is not an issue (and I think these are cases where parallel generation makes sense), but the edge cases are where all the money lives.

  • But I do wonder if diffusion models will be used in more complex Software Architecture for their long-term coherence, no exposure bias, and their symmetric architecture could work well with interaction nets.