Comment by cttet
3 days ago
It seem to have both feature and a discrete number passed into next layer, which one did you think of first? or it is both by design?
3 days ago
It seem to have both feature and a discrete number passed into next layer, which one did you think of first? or it is both by design?
I understand that by "discrete number" you mean the selected output of each layer.
Both the "feature" and the "selected output" are designed to be passed to the next layer.
Oh it is selected output, yes I meant that I was a bit confused. So in the initial design when you first tried it, you passed both to the next layer? or it is part of where you find out to perform better?
Even in the earliest stages of the DDN concept, we had already decided to pass features down to the next layer.
I never even ran an ablation that disabled the stem features; I assume the network would still train without them, but since the previous layer has already computed the features, it would be wasteful not to reuse them. Retaining the stem features also lets DDN adopt the more efficient single-shot-generator architecture.
Another deeper reason is that, unlike diffusion models, DDN does not need the Markov-chain property between adjacent layers.
1 reply →