Comment by cttet

3 days ago

Oh it is selected output, yes I meant that I was a bit confused. So in the initial design when you first tried it, you passed both to the next layer? or it is part of where you find out to perform better?

Even in the earliest stages of the DDN concept, we had already decided to pass features down to the next layer.

I never even ran an ablation that disabled the stem features; I assume the network would still train without them, but since the previous layer has already computed the features, it would be wasteful not to reuse them. Retaining the stem features also lets DDN adopt the more efficient single-shot-generator architecture.

Another deeper reason is that, unlike diffusion models, DDN does not need the Markov-chain property between adjacent layers.