Comment by macrolime

1 year ago

But not multimodal reasoning, the intermediate and output tokens are text only, at least in the released version, they probably have actual multimodal reasoning that's not been shown yet, as they already showed gpt-4o can output image tokens,but that's not been released yet either.

1 comment

macrolime

coder543 1 year ago

That wasn’t the question… they asked if any multimodal models had been reasoning trained. o1 fits that criteria precisely, and it can reason about the image input.

They didn’t ask about a model that can create images while thinking. That’s an entirely unrelated topic.