Comment by LarsDu88
2 hours ago
Does anyone know of the current intrinsic limitations with Diffusion text models compared to autoregressive?
I ran this question by ChatGPT and Claude and they came up with limitations in GRPO RLVR, but I'm not sure..
2 hours ago
Does anyone know of the current intrinsic limitations with Diffusion text models compared to autoregressive?
I ran this question by ChatGPT and Claude and they came up with limitations in GRPO RLVR, but I'm not sure..
CoT legibility largely disappears which is quite concerning from a safety perspective
The intrinsic limitation of text diffusion is that natural text contains serial dependencies where a word at the beginning of the text strongly influences what comes later, and if there is a long enough dependency chain within a diffusion block, the small number of diffusion steps may not be enough to resolve all dependencies, so that you end up with incoherent output.