> If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse.
This is not a tokenization artefact. And furthermore it's a problem for human brains as well.
Let's say you get a name, idk, Tom Cruise. You immediately know what his face looks like. Now let's say you get a random face. How quickly would you be able to tell me what that person is named? Likely a lot of "uhhs" and "ermms" will follow. It's super hard for us to generalize this reversal automatically in lots of cases. Associations tend to be one directional.
That's not a great example. Remembering a face is memory recall, whereas what's at stake here is LLMs not being able to infer simple relationships - if it learns from data that "John owns the red bicycle", it will succeed at answering "what does John own", but not "who owns the red bicycle". The relationship it learns is unidirectional.
If you read the paper again, they deal with pre-training data and fine tuning data specifically. Their test is on information being pulled out zero-shot, which would mean the steps when attention finds associations between tokens are one directional. This is just testing recall as well, as such my example is as apples to apples you can get when comparing systems with such large complexity disparities.
In-context reasoning tends to work a lot more reliably for these examples, if you put any of the test statements into it directly before asking the question, practically any llm can answer correctly. That's why very small models are still useful for RAG use cases.
> If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse.
This is not a tokenization artefact. And furthermore it's a problem for human brains as well.
Let's say you get a name, idk, Tom Cruise. You immediately know what his face looks like. Now let's say you get a random face. How quickly would you be able to tell me what that person is named? Likely a lot of "uhhs" and "ermms" will follow. It's super hard for us to generalize this reversal automatically in lots of cases. Associations tend to be one directional.
That's not a great example. Remembering a face is memory recall, whereas what's at stake here is LLMs not being able to infer simple relationships - if it learns from data that "John owns the red bicycle", it will succeed at answering "what does John own", but not "who owns the red bicycle". The relationship it learns is unidirectional.
Here's the intro to the paper that brought this to light: https://www.lesswrong.com/posts/SCqDipWAhZ49JNdmL/paper-llms...
If you read the paper again, they deal with pre-training data and fine tuning data specifically. Their test is on information being pulled out zero-shot, which would mean the steps when attention finds associations between tokens are one directional. This is just testing recall as well, as such my example is as apples to apples you can get when comparing systems with such large complexity disparities.
In-context reasoning tends to work a lot more reliably for these examples, if you put any of the test statements into it directly before asking the question, practically any llm can answer correctly. That's why very small models are still useful for RAG use cases.