Comment by edude03

2 years ago

How do we know it generates the images itself and isn’t passing the text to dalle? It’s supposedly how the current gpt4 model does listen mode (with whisper but same idea)

3 comments

edude03

GaggiX 2 years ago

Go to the "Explorations of capabilities" and explore all the capabilities: https://openai.com/index/hello-gpt-4o/

You cannot have this level of control by prompting Dalle, also GPT-4o isn't using Whisper (older GPT-4s yes).

ec109685 2 years ago

At least ChatGPT 4o still looks like it is using dalle.

https://x.com/krishnanrohit/status/1755123169353236848?s=46

hackerlight 2 years ago

Two reasons - the shown capabilities are way beyond what dalle is capable of, and they've been clear that this "omni" model by the "omni team" is natively multimodal