← Back to context

Comment by daxfohl

3 months ago

Yeah but IIUC they're both just representations of embeddings in a latent space, translated from one format to another. So if the image interpretation of a text embedding is full of hallucinations, it's unlikely that the other direction works well either (again, IIUC).

That said, I'll be interested to see what the DeepSeek model can do once they've trained it in the other direction. It'd be great to have it output architecture diagrams that actually correspond to what it says in the chat.