← Back to context

Comment by nilamo

11 hours ago

If it's wrong 2 out of 5 times, why even waste your time going to it in the first place? That's a massive failure rate.

Because it finds the sources much quicker than I would have been able to on my own, and I can then synthesize them into data I know is correct, as correct as any human-generated data can be of course.

  • But what that because their search was so bad that it took you that long to find the sources?

    • No, it's usually because it finds sources that I would not have even thought to search for in the first place.

      Agentic AI has its faults, but one thing I've found it to be very good at is surfacing the "unknown unknowns": things I didn't know I should have searched for but that are directly relevant to my problem.

Because way more than three out of five Google results are SEO garbage or sponsored crap. The bar has been set extremely low by Google, a 60% validity rate sounds magical.

If I'm going to an LLM (as with websearch before it), it's usually because I don't know the answer, don't have anyone close to me that knows the answer, and can't pay anyone (or don't know who to pay) for the answer. In other words, my failure rate without the LLM would be 100%.

  • The problem is that everything you have said renders you unable to determine the validity of the answer provided.

    Sometimes that is fine, sometimes it is not

    • It's much easier to determine the truth of an answer than it is to come up with that answer yourself. This is analogous to the P=NP problem or the recognition vs. recall problem: it is much easier to recognize and verify a correct answer than it is to recall or generate it yourself.

      I've got a pretty solid algorithm for checking correctness: I ask the LLM for its sources, I try to find 3-5 independent ones (that are not just copying each others' answers), and if they all agree, that's very likely to be the correct answer. Simple math here: if you have 5 sources and they are each 60% likely to be correct, then an LLM choosing at random from them would have a 60% success rate, while someone checking all 5 of them for agreement would have a 1 - (0.4^5) = 99% chance of being correct. It's a good algorithm for doing other things like verifying scientific papers, too: you look for indendent research groups that have all reproduced the same findings.

      I did the same thing with ten-blue-links websearch as well, and hope this would be the habit of anyone else too. (Although I know it wasn't, because I worked on Google websearch 15 years ago, on a project to increase the credibility of search results, and we did cafeteria UX studies about "What makes a credible result?" and everybody said "Because it appears as the top result on Google.")

I don't find it nearly that bad. If I really need factual information, it will generally go off and read the data from primary sources anyway. So unless it's really misunderstanding context, you're getting the data from the source.

  • It really matters the task. General knowledge from Wikipedia, great. Things more specific, with any thought needing to be used, or technical fields outside of software his numbers are pretty close to mine.

    • The problem too, is that we're all using different tools with different experiences -- there isn't one "AI". And if you're not paying for it, you're getting some real bad experience.

With Google returning lists full of SEO spam, 2 out of 5 is quite good. If you know something better than that, I'd love to hear it.

Because being right 60% of the time with minimal work is still amazing, as long as one accounts for the failure rate correctly.

Say I want to look up some game from my childhood, which I barely remember any details for. Going to google and trying is likely going to be very difficult unless I happen to get lucky with some key element. But if an LLM can get it right even a minority of the time, it can lead to me quickly finding the game I'm looking for.

This does depend upon the ability to evaluate the answer, like checking against source or some other option where you know a good answer from bad. If you can't, then it does become much more dangerous. Perhaps part of the reason AI seem to empower experts more than novices in some domains?