Comment by daveguy
4 hours ago
Better options would have been "True", "False", "Unknown" (which opinions would fall under too). That also includes an interesting assessment of how well LLMs can identify missing information. My guess is they would be a very low number of "unknown" and a much higher level of agreement (assuming equal representation). Unless the RLHF techniques have gotten better at getting an LLM to say "I don't know", which I doubt. Saying "I don't know" is not good for a dopamine release to keep users coming back for more.
Tried initially with a fifth bucket, Abstain. It was actually heavily used by some of the models. But it felt as if they are using this to "avoid" some of the hard questions, and we dropped this bucket to force them to provide a verdict.
>But it felt as if they are using this to "avoid" some of the hard questions, and we dropped this bucket to force them to provide a verdict.
do you not see how that creates extremely misleading and valueless results? you are coercing the results into what you want to see.
Exactly what people do when they use LLMs for "fact-checking" online, and any verbose explanation would be mostly ignored anyway, when people ask political, ethical, or simply ambiguous questions that they hold any stakes in.
Don't even need politics for it, there is no point in probing a mathematical black box for "how many soldiers died in the year X in war Y".
Any original source is preferable to a blurry "summary" of unknown sources, and this is why the article has a valuable point.
There's also no point in asking "Is Paris in France" either, if you substitute city and country with real data. An encyclopedia or manual check of different sources such as maps, while not infallible, is a better source.
If you already know the country Paris belongs to, there's no point in asking, anyway.
2 replies →
@john_strinlai @gcr, depends on the application. In many cases an "I don't know" answer is indeed better than a forced answer. But in many production systems, LLMs generate content/response anyway.
Although inheriting the messiness of the real-world, the majority of these claims are objective enough to be classifiable by human experts with access to research. Plan to human-label the 1,000 claims and publish a follow-up research. Will consider adding an "I don't know" bucket too, as well as a clear instructions about the meaning of each of the 4 buckets.
If you're going to run this again I also recommend encouraging the model to provide its rationale and then having it return the true/false/misleading/mostly-true/abstain at the end of its response.
Models give much better answers when they can "think out loud" before answering, and storing that rationale will make it easier to understand why they picked different answers for ambiguous questions.
4 replies →
In many cases “I don’t know” is the correct answer - for questions about events that happened after the training cut off, if it doesn’t have web search, that is undeniably the correct answer. You’re forcing it to guess unnaturally. That really feels like you’re trying to prove a point (that your service can’t be replaced by AI) instead of actually performing research into how AI can be helpfully applied to this topic.
I'm sorry, but many of the statements that you fed it are verifiably unknown, and you didn't give it an "unknown" option? This is the academic equivalent of clickbait.
Shouldn't that be part of the test?
Real-world systems need to be able to say "I don't know." This is a test about misinformation after all, and overconfident responses contribute to that.
Teasing out the difference between "avoid" and "unknown" could be a different research question
Teams I work with use the abstain rate to flag what goes to a human. Disagreement between models is the same idea. Your 67% is what makes "two cheap models, escalate when they fight" actually work. Without abstain it mostly looks like noise.
Do you understand how problematic this is?
[dead]
[dead]
I wouldn’t expect opinions to go into “unknown.” Maybe have an “it’s complicated” bucket.