← Back to context

Comment by usernametaken29

5 hours ago

Kind of. Autoencoders don’t need to have an embedding that’s smaller than the input. Their only requirement is that they compress information and thus create reconstruction loss. Typically however they are not trained this way because they don’t converge.. transformers do the same thing, but they can squeeze much more bits of information through one pass because the way they are designed. This holds true even for decoder only networks because they’re still doing the same thing