Comment by jc4p
6 months ago
i've been trying to keep up with this field (image generation) so here's quick notes I took:
Claude's Summary: "Normalizing flows aren't dead, they just needed modern techniques"
My Summary: "Transformers aren't just for text"
1. SOTA model for likelihood on ImageNet 64×64, first ever sub 3.2 (Bits Per Dimension) prev was 2.99 by a hybrid diffusion model
2. Autoregressive (transformers) approach, right now diffusion is the most popular in this space (it's much faster but a diff approach)
tl;dr of autoregressive vs diffusion (there's also other approaches)
Autoregression: step based, generate a little then more then more
Diffusion: generate a lot of noise then try to clean it up
The diffusion approach that is the baseline for sota is Flow Matching from Meta: https://arxiv.org/abs/2210.02747 -- lots of fun reading material if you throw both of these into an LLM and ask it to summarize the approaches!
You have a few minor errors and I hope I can help out.
You could say this about Flows too. The history of them is shared with diffusion and goes back to the Whitening Transform. Flows work by a coordinate transform so we have an isomorphism where diffusion works through, for easier understanding, a hierarchical mixture of gaussians. Which is a lossy process (more confusing when we get into latent diffusion models, which are the primary type used). The goal of a Normalizing Flow is to turn your sampling distribution, which you don't have an explicit representation of, into a probability distribution (typically Normal Noise/Gaussian). So in effect, there are a lot of similarities here. I'd highly suggest learning about Flows if you want to better understand Diffusion Models.
To be clear, Flow Matching is a Normalizing Flow. Specifically, it is a Continuous and Conditional Normalizing Flow. If you want to get into the nitty gritty, Ricky has a really good tutorial on the stuff[0]
[0] https://arxiv.org/abs/2412.06264
thank you so much!!! i should’ve put that final sentence in my post!
Happy to help and if you have any questions just ask, this is my jam