Comment by yorwba

3 days ago

Since the sampling probability is 1/K independent of the input, you don't need to compute K different intermediate outputs at each layer during inference, you can instead decide ahead of time which of the outputs you want to use and only compute that one.

(This is mentioned in Q1 in the "Common Questions About DDN" section at the bottom.)

you dont get to do that for conditional generation though. When we have a target then we have to generate multiple, pick closest to target, and discard the rest.