Comment by another_poster

3 months ago

Is “multimodal reasoning” as big a deal as it sounds? Does this technique mean LLMs can generate chains of thought that map to other modalities, such as sound and images?

From what I understood (not an expert), it seems that it's the goal, to see if the knowledge in one modality can be translated in an another one. Typically, if a model trained on sound can leverage the knowledge of musical theory, it would be quite interesting