Comment by hansvm

1 month ago

Perhaps. Would you mind elaborating on what you're envisioning?

In both cases (auto-regressive vs diffusive), you still have some process that's being followed, and the exact steps in the process are important to the result. If you constraint at each step then you get the equivalent of something like projected gradient descent (as an analogy) and aren't guaranteed the same solution. If you constrain as a post-processing phase then (a) diffusion wasn't required for the initial generation, and (b) that's still unlikely to converge to the same distribution (for similar reasons -- using my example of ellipsis errors, if you corrected that particular mistake in post then the closest valid messages to the initial generation are likely too short and thus still incorrect).