← Back to context

Comment by cortesoft

13 hours ago

LLMs also have other inputs, like audio and images. They get encoded (just like a human eye encodes an image) and passed to the weights.

I don’t think this analogy holds. The whole way through the processing pipeline in the brain, different sensory data is ingested separately and processed separately; and we still don’t understand how that data is then integrated into a cohesive experience.

LLMs have the same fundamental input regardless of modality, tokens. There is a preprocessing step before the “brain”, which is more akin to some super-synesthesia where all senses are translated into sound before becoming experience.