Comment by Vegenoid
10 days ago
I don't mean that the primary (or only) way that it interacts with a human can't be just text. Right now, the only way it interacts with anything is by generating a stream of tokens. To make any API calls, to use any tool, to make any query for knowledge, it is predicting tokens in the same way as it does when a human asks it a question. There may need to be other subsystems that the LLM subsystem interfaces with to make a more complete intelligence that can internally represent reality and fully utilize abstraction and relations.
I have not yet found any compelling evidence that suggests that there are limits to the maximum intelligence of a next token predictor.
Models can be trained to generate tokens with many different meanings, including visual, auditory, textual, and locomotive. Those alone seem sufficient to emulate a human to me.
It would certainly be cool to integrate some subsystems like a symbolic reasoner or calculator or something, but the bitter lesson tells us that we'd be better off just waiting for advancements in computing power.