Comment by edude03
9 months ago
How do we know it generates the images itself and isn’t passing the text to dalle? It’s supposedly how the current gpt4 model does listen mode (with whisper but same idea)
9 months ago
How do we know it generates the images itself and isn’t passing the text to dalle? It’s supposedly how the current gpt4 model does listen mode (with whisper but same idea)
Go to the "Explorations of capabilities" and explore all the capabilities: https://openai.com/index/hello-gpt-4o/
You cannot have this level of control by prompting Dalle, also GPT-4o isn't using Whisper (older GPT-4s yes).
At least ChatGPT 4o still looks like it is using dalle.
https://x.com/krishnanrohit/status/1755123169353236848?s=46
Two reasons - the shown capabilities are way beyond what dalle is capable of, and they've been clear that this "omni" model by the "omni team" is natively multimodal