Comment by cooljoseph
2 days ago
This sounds somewhat like a normalizing flow from a discrete space to a continuous space. I think there's a way you can rewrite your DDN layer as a normalizing flow which avoids the whole split and prune method.
1. Replace the DDN layer with a flow between images and a latent variable. During training, compute in the direction image -> latent. During inference, compute in the direction latent -> image. 2. For your discrete options 1, ..., k, have trainable latent variables z_1, ..., z_k. This is a "code book".
Training looks like the following: Start with an image and run a flow from the image to the latent space (with conditioning, etc.). Find the closest option z_i, and compute the L2 loss between z_i and your flowed latent variable. Additionally, add a loss corresponding to the log determinant of the Jacobian of the flow. This second loss is the way a normalizing flow avoids mode collapse. Finally, I think you should divide the resulting gradient by the softmax of the negative L2 losses for all the latent variables. This gradient division is done for the same reason as dividing the gradient when training a mixture-of-experts model.
During inference, choose any latent variable z_i and flow from that to a generated image.
Thanks for the idea, but DDN and flow can’t be flipped into each other that easily.
1. DDN doesn’t need to be invertible. 2. Its latent is discrete, not continuous. 3. As far as I know, flow keeps input and output the same size so it can compute log|detJ|. DDN’s latent is 1-D and discrete, so that condition fails. 4. To me, “hierarchical many-shot generation + split-and-prune” is simpler and more general than “invertible design + log|detJ|.” 5. Your design seems to have abandoned the characteristics of DDN. (ZSCG, 1D tree latent, lossy compression)
The two designs start from different premises and are built differently. Your proposal would change so much that whatever came out wouldn’t be DDN any more.
> This sounds somewhat like...
Linus once said: "Talk is cheap. Show me the code."