Comment by tempest_

8 hours ago

The problem is that everything you have said renders you unable to determine the validity of the answer provided.

Sometimes that is fine, sometimes it is not

1 comment

tempest_

It's much easier to determine the truth of an answer than it is to come up with that answer yourself. This is analogous to the P=NP problem or the recognition vs. recall problem: it is much easier to recognize and verify a correct answer than it is to recall or generate it yourself.

I've got a pretty solid algorithm for checking correctness: I ask the LLM for its sources, I try to find 3-5 independent ones (that are not just copying each others' answers), and if they all agree, that's very likely to be the correct answer. Simple math here: if you have 5 sources and they are each 60% likely to be correct, then an LLM choosing at random from them would have a 60% success rate, while someone checking all 5 of them for agreement would have a 1 - (0.4^5) = 99% chance of being correct. It's a good algorithm for doing other things like verifying scientific papers, too: you look for indendent research groups that have all reproduced the same findings.

I did the same thing with ten-blue-links websearch as well, and hope this would be the habit of anyone else too. (Although I know it wasn't, because I worked on Google websearch 15 years ago, on a project to increase the credibility of search results, and we did cafeteria UX studies about "What makes a credible result?" and everybody said "Because it appears as the top result on Google.")