Comment by bytefactory

8 months ago

Can you clarify my understanding as a layman please?

Are you saying that LLMs hold concepts in latent space (weights?), but the actual predictions are always in tokens (thus inefficient and lossy), whereas JEPA operates directly on concepts in latent space (plus encoders/decoders)?