Comment by earthnail
6 hours ago
Took me a short time to understand what you mean with "autoencoders on steroids", but I believe you mean they are autoencoders with an inverse bottleneck - an intermediate representation that isn't smaller, but that's much larger than the input space. Is my understanding of your comment correct?
Kind of. Autoencoders don’t need to have an embedding that’s smaller than the input. Their only requirement is that they compress information and thus create reconstruction loss. Typically however they are not trained this way because they don’t converge.. transformers do the same thing, but they can squeeze much more bits of information through one pass because the way they are designed. This holds true even for decoder only networks because they’re still doing the same thing