Comment by another_poster
3 months ago
Is “multimodal reasoning” as big a deal as it sounds? Does this technique mean LLMs can generate chains of thought that map to other modalities, such as sound and images?
3 months ago
Is “multimodal reasoning” as big a deal as it sounds? Does this technique mean LLMs can generate chains of thought that map to other modalities, such as sound and images?
From what I understood (not an expert), it seems that it's the goal, to see if the knowledge in one modality can be translated in an another one. Typically, if a model trained on sound can leverage the knowledge of musical theory, it would be quite interesting
It'd be cool to see its reasoning for solving visual puzzles, as imagery.