I'm saying both human brains and LLMs are bad at knowing when they're unsure — it may not even be possible to make an LLM that's superhuman in this regard.
Most random humans have nearly constant confidence in their answers regardless of how much they know about a topic.
--
Hmm. Stream of consciousness thoughts here:
Problem is LLMs are very broad "System 1" thinking, no "System 2". Other kinds of model that combine LLM with a more logical module? Perhaps that could work.
And if so then I guess you could use the logical module to create tokens for an LLM to learn from? Eventually, but may be inefficient.
To work, logic module needs to know what LLM gets wrong, to create tokens for LLM to learn from; if you can do that, but fine tuning is slow and expensive and still errs, so why not keep the logic separate and just have it point out mistakes and have LLM regenerate answers a few times and if that doesn't work then say "IDK"?
Dunning Kruger, and also everyone who thinks they know better than domain experts.
Humans who are wrong are often completely oblivious to being wrong.
Ok, but what does that have to do with LLMS?
I'm saying both human brains and LLMs are bad at knowing when they're unsure — it may not even be possible to make an LLM that's superhuman in this regard.
Most random humans have nearly constant confidence in their answers regardless of how much they know about a topic.
--
Hmm. Stream of consciousness thoughts here:
Problem is LLMs are very broad "System 1" thinking, no "System 2". Other kinds of model that combine LLM with a more logical module? Perhaps that could work.
And if so then I guess you could use the logical module to create tokens for an LLM to learn from? Eventually, but may be inefficient.
To work, logic module needs to know what LLM gets wrong, to create tokens for LLM to learn from; if you can do that, but fine tuning is slow and expensive and still errs, so why not keep the logic separate and just have it point out mistakes and have LLM regenerate answers a few times and if that doesn't work then say "IDK"?
[dead]