Comment by imtringued
20 days ago
You got this exactly backwards.
"I'm not fond of metaphors to human intelligence".
You're assuming that learning during inference is something specific to humans and that the suggestion is to add human elements into the model that are missing.
That isn't the case at all. The training process is already entirely human specific by way of training on human data. You're already special casing the model as hard as possible.
Human DNA doesn't contain all the information that fully describes the human brain, including the memories stored within it. Human DNA only contains the blue prints for a general purpose distributed element known as neurons and these building blocks are shared by basically any animal with a nervous system.
This means if you want to get away from humans you will have to build a model architecture that is more general and more capable of doing anything imaginable than the current model architectures.
Context is not suitable for learning because it wasn't built for that purpose. The entire point of transformers is that you specify a sequence and the model learns on the entire sequence. This means that any in-context learning you want to perform must be inside the training distribution, which is a different way of saying that it was just pretraining after all.
The fact the DNA doesn't store all connections in the brain doesn't mean that enormous parts of the brain, and by extension, behaviour aren't specified in the DNA. Tons of animals have innate knowledge encoded in their DNA, humans among them.
I don't think it's specific to humans at all, I just think the properties of learning are different in humans than they are in training an LLM, and injecting context is different still. I'd rather talk about the exact properties than bemoan that context isn't learning. We should just talk about the specific things we see as problems.