Comment by techpression
1 day ago
This is the problem, the entire internet is a really bad set of training data because it’s extremely polluted.
Also the derived argument doesn’t really hold, just because you know about two things doesn’t mean you’d be able to come up with the third, it’s actually very hard most of the time and requires you to not do next token prediction.
The emergent phenomenon is that the LLM can separate truth from fiction when you give it a massive amount of data. It can figure the world out just as we can figure it out when we are as well inundated with bullshit data. The pathways exist in the LLM but it won’t necessarily reveal that to you unless you tune it with RL.
> The emergent phenomenon is that the LLM can separate truth from fiction when you give it a massive amount of data.
I don't believe they can. LLMs have no concept of truth.
What's likely is that the "truth" for many subjects is represented way more than fiction and when there is objective truth it's consistently represented in similar way. On the other hand there are many variations of "fiction" for the same subject.
They can and we have definitive proof. When we tune LLM models with reinforcement learning the models end up hallucinating less and becoming more reliable. Basically in a nut shell we reward the model when telling the truth and punish it when it’s not.
So think of it like this, to create the model we use terabytes of data. Then we do RL which is probably less than one percent of additional data involved in the initial training.
The change in the model is that reliability is increased and hallucinations are reduced at a far greater rate than one percent. So much so that modern models can be used for agentic tasks.
How can less than one percent of reinforcement training get the model to tell the truth greater than one percent of the time?
The answer is obvious. It ALREADY knew the truth. There’s no other logical way to explain this. The LLM in its original state just predicts text but it doesn’t care about truth or the kind of answer you want. With a little bit of reinforcement it suddenly does much better.
It’s not a perfect process and reinforcement learning often causes the model to be deceptive an not necessarily tell the truth but it more gives an answer that may seem like the truth or an answer that the trainer wants to hear. In general though we can measurably see a difference in truthfulness and reliability to an extent far greater than the data involved in training and that is logical proof it knows the difference.
Additionally while I say it knows the truth already this is likely more of a blurry line. Even humans don’t fully know the truth so my claim here is that an LLM knows the truth to a certain extent. It can be wildly off for certain things but in general it knows and this “knowing” has to be coaxed out of the model through RL.
Keep in mind the LLM is just auto trained on reams and reams of data. That training is massive. Reinforcement training is done on a human basis. A human must rate the answers so it is significantly less.
2 replies →