Comment by valine
3 months ago
Image generation and image input are two totally different things. This is about feeding text into LLMs as images, it has nothing to do with image generation.
3 months ago
Image generation and image input are two totally different things. This is about feeding text into LLMs as images, it has nothing to do with image generation.
Yeah but IIUC they're both just representations of embeddings in a latent space, translated from one format to another. So if the image interpretation of a text embedding is full of hallucinations, it's unlikely that the other direction works well either (again, IIUC).
That said, I'll be interested to see what the DeepSeek model can do once they've trained it in the other direction. It'd be great to have it output architecture diagrams that actually correspond to what it says in the chat.