Comment by programjames

14 hours ago

As explanation, something I wrote previously:

The most common approach to modeling continuous distributions is to train a reversible model f that maps it to another continuous distribution P that is already known. The original image can be recovered by tracking the bits needed to encode its latent, as well as the reverse path:

  −log P(f(x)) − log|det ∂f/∂x (x)|

This technique is known as normalizing flows, as usually a normal distribution is chosen for the known distribution. The second term can be a little hard to compute, so diffusion models approximate it by using a stochastic PDE for the mapping. When f is a solution to an ordinary differential equation,

  dx/dt = g(x)

then

  log|det ∂f/∂x (x)| = ∫ Tr(∂g(x)/∂x) dt = ∫ E_{ε∼N(0,I)} [εᵀ ∂g(x)/∂x ε] dt

The last equality is known as Hutchison's estimator. Switching to a stochastic PDE

  dx′ = g(x′)dt + ε(t)dW

and tracking the difference δx = x′ − x, the mean-squared error approximately satisfies

  d(δxᵀδx)/dt = 2δxᵀ ∂g(x)/∂x δx,