Comment by f_devd

2 days ago

You are probably right, although it's not similar to a GAN at all, it is significantly more like diffusion (although maybe not latent, the main reason I assumed so is because the "features" are passed-through but these can just be the image).

The ahead-of-time sampling doesn't make much sense to me mechanically, and isn't really mentioned much. But I will hold my judgement for future versions since the FID performance of this first iteration is still not that great.