Comment by rafaelero

2 years ago

VQVAE is trained to reconstruct the image, so in theory it should contain all the information (both content and location) inside its embeddings.

0 comments