Comment by another_poster
10 months ago
Is “multimodal reasoning” as big a deal as it sounds? Does this technique mean LLMs can generate chains of thought that map to other modalities, such as sound and images?
10 months ago
Is “multimodal reasoning” as big a deal as it sounds? Does this technique mean LLMs can generate chains of thought that map to other modalities, such as sound and images?
From what I understood (not an expert), it seems that it's the goal, to see if the knowledge in one modality can be translated in an another one. Typically, if a model trained on sound can leverage the knowledge of musical theory, it would be quite interesting
It'd be cool to see its reasoning for solving visual puzzles, as imagery.