Comment by valine

4 months ago

Image generation and image input are two totally different things. This is about feeding text into LLMs as images, it has nothing to do with image generation.

1 comment

valine

daxfohl 4 months ago

Yeah but IIUC they're both just representations of embeddings in a latent space, translated from one format to another. So if the image interpretation of a text embedding is full of hallucinations, it's unlikely that the other direction works well either (again, IIUC).

That said, I'll be interested to see what the DeepSeek model can do once they've trained it in the other direction. It'd be great to have it output architecture diagrams that actually correspond to what it says in the chat.