Comment by tomohelix
15 hours ago
Technically, the models can already learn on the fly. Just that the knowledge it can learn is limited to the context length. It cannot, to use the trendy word, "grok" it and internally adjust the weights in its neural network yet.
To change this you would either need to let the model retrain itself every time it receives new information, or to have such a great context length that there is no effective difference. I suspect even meat models like our brains is still struggling to do this effectively and need a long rest cycle (i.e. sleep) to handle it. So the problem is inherently more difficult to solve than just "thinking". We may even need an entire new architecture different from the neural network to achieve this.
Google just published a paper on a new neural architecture that does exactly that, called Titans.
> Technically, the models can already learn on the fly. Just that the knowledge it can learn is limited to the context length.
Isn't that just improving the prompt to the non-learning model?
Only small problem is that models are neither thinking nor understanding, I am not sure how this kind of wording is allowed with these models.
All words only gain meaning through common use: where two people mean different things by some word, we influence each other until we're in agreement.
Words about private internal state don't get feedback about what they actually are on the inside, just about what they look like on the outside* — "thinking" and "understanding" map to what AI give the outward impression of, even if the inside is different in whatever ways you regard as important.
* This is also how people with aphantasia keep reporting their surprise upon realising that scenes in films where a character is imagining something are not merely artistic license.