Comment by dkdbejwi383
1 day ago
How would an LLM “know” when it isn’t sure? Their baseline for truth is competent text, they don’t have a baseline for truth based on observed reality. That’s why they can be “tricked” into things like “Mr Bean is the president of the USA”
It would "know" the same way it "knows" anything else: The probability of the sequence "I don't know" would be higher than the probability of any other sequence.
Exactly. It's easy to imagine a component in the net that the model is steered towards when nothing else has a high enough activation.
The answer is the same as how the messy bag of chemistry that is the human brain "knows" when it isn't sure:
Badly, and with great difficulty, so while it can just about be done, even then only kinda.
We really don’t understand the human brain well enough to have confidence that the mechanisms that cause people to respond with “I don’t know” are at all similar to the mechanisms which cause LLMs to give such responses. And there are quite a few prima facie reasons to think that they wouldn’t be the same.
FWIW, I'm describing failure modes of a human, not mechanisms.
I also think "would" in the comment I'm replying to is closer to "could" than to "does".
1 reply →
The mechanics don't have to be similar, only analogous, in the morphology sense.
3 replies →
Humans can just as easily be tricked. Something like 25% of the American Electorate believed Obama was the antichrist.
So saying LLMs have no "baseline for truth" doesn't really mean much one way of the other, they are much smart and accurate than 99% of humans.