Comment by cortesoft
15 hours ago
LLMs also have other inputs, like audio and images. They get encoded (just like a human eye encodes an image) and passed to the weights.
15 hours ago
LLMs also have other inputs, like audio and images. They get encoded (just like a human eye encodes an image) and passed to the weights.
I don’t think this analogy holds. The whole way through the processing pipeline in the brain, different sensory data is ingested separately and processed separately; and we still don’t understand how that data is then integrated into a cohesive experience.
LLMs have the same fundamental input regardless of modality, tokens. There is a preprocessing step before the “brain”, which is more akin to some super-synesthesia where all senses are translated into sound before becoming experience.