Comment by energy123
10 days ago
Probably. But it's genuinely surprising that truthfulness isn't an emergent property of getting the final answer correct, which is what current RL reward labels focus on. If anything it looks to be the opposite as o3 has double the hallucinations of o1. What is the explanation for this?
LLM's are trained on likelihood, not truthiness. To get truthiness you need actual reasoning, not just a big data dump. (And we stopped researching actual reasoning two AI winters ago, ain't coming back, sorry.)
The problem isn't truthfulness per se but rather the judgement call of knowing that a) you haven't reached a sufficiently truthful answer and b) how to communicate that appropriately
A simple way to stop hallucinating would be to always state that "I don't know for sure, but my educated guess would be ..." but that's clearly not what we want.