← Back to context

Comment by btickell

7 months ago

Thought I'd weigh in here as well, I believe Gibbs sampling is being used as a way to approximate the expectation over the model distribution. This value is required to compute the gradient of the log likelihood but integrating the distribution is intractable.

This is done in a similar way as you may use MCMC to draw a representative sample from a VAE. In the deep learning formulation of a neural network the gradient is estimated over batches of the dataset rather than over an explicitly modeled probability distribution.