Comment by AlexCoventry

2 days ago

Doesn't involve Gaussians, but:

The Free Transformer: https://arxiv.org/abs/2510.17558

Abstract: We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks.