Comment by vermilingua

7 hours ago

I don’t think this analogy holds. The whole way through the processing pipeline in the brain, different sensory data is ingested separately and processed separately; and we still don’t understand how that data is then integrated into a cohesive experience.

LLMs have the same fundamental input regardless of modality, tokens. There is a preprocessing step before the “brain”, which is more akin to some super-synesthesia where all senses are translated into sound before becoming experience.