Comment by nickpsecurity
1 year ago
We learn the ideas from each mode of input. Then, one mode can elaborate on data learned from another mode. They build on each other.
From there, remember the text is usually a reflection of things in the real world. Understanding those things in non-textual ways both gives meaning to and deeper understanding of the text. Much of the text itself was even stored in other modes, like markup or PDF’s, whose structure tells us things about it.
That we learn multimodal from birth is therefore an important point to make.
It might also be a prerequisite for AGI. It could be one of the fundamental laws of information theory or something. Text might not be enough like how digital devices need analog to interface with the real world.
No comments yet
Contribute on Hacker News ↗