Comment by Mizza

1 year ago

Have any multimodal models been reasoning-trained yet?

3 comments

Mizza

Reply

coder543 1 year ago

https://platform.openai.com/docs/models/#o1

> The latest o1 model supports both text and image inputs

macrolime 1 year ago
But not multimodal reasoning, the intermediate and output tokens are text only, at least in the released version, they probably have actual multimodal reasoning that's not been shown yet, as they already showed gpt-4o can output image tokens,but that's not been released yet either.
- coder543 1 year ago
  
  That wasn’t the question… they asked if any multimodal models had been reasoning trained. o1 fits that criteria precisely, and it can reason about the image input.
  They didn’t ask about a model that can create images while thinking. That’s an entirely unrelated topic.