Comment by schmorptron
6 hours ago
What would a diffusing reasoning model look like? have a pre-defined length [thinking] block that gets diffused over a long time, and then the final output block uses what is in that thinking block as part of its input? And how do diffusion models decide the output length in the first place, is it a pre-set parameter? or does it diffuse an [end] token into the middle somewhere?
got one answer by reading the rest of the comments, makes sense that the diffusion process is inherently reasoning-like: https://www.inceptionlabs.ai/blog/introducing-mercury-2