← Back to context

Comment by macrolime

2 days ago

But not multimodal reasoning, the intermediate and output tokens are text only, at least in the released version, they probably have actual multimodal reasoning that's not been shown yet, as they already showed gpt-4o can output image tokens,but that's not been released yet either.

That wasn’t the question… they asked if any multimodal models had been reasoning trained. o1 fits that criteria precisely, and it can reason about the image input.

They didn’t ask about a model that can create images while thinking. That’s an entirely unrelated topic.