← Back to context

Comment by gsk22

7 hours ago

The problem is Google's AI results get even simple factual questions wrong all the time.

Earlier today, I searched "pixel 10 wifi 7" because I was confused that GSMArena showed my Pixel 8 supports Wifi 7, but the Pixel 10 only Wifi 6. Gemini confidently claimed that the Pixel 10 does support Wifi 7 -- but that's not true at all. Only the Pixel 10 _Pro_ supports it, as I discovered when actually reading the non-AI search results.

And this is a question about a Google product!

I had a similar thing when I was gooling a few days ago, I can't remember exactly but it was like "why does [product] not support [feature]" and the AI summary was confidently wrong, saying "The product does support [feature]", which knew was completely incorrect, and I did find a Reddit discussion or something in the actual results with discussions that were actually about what I was looking for!

It's really depressing how bad things are getting...

  • It’s hilariously persistent in this, esp. for anything even slightly divergent from the beaten path. Discount everything the AI box says about emacs to zero.

Admittedly I’m unsure if it was Google or DuckDuckGo. I switch between both. I quickly asked the in search AI for a UTC time conversion like a lazy fool and it got it off by almost a day wrong.

  • I avoid any asking any agent a fact-based (especially math) request. It's a great compression algorithm and a great language generator, and I guess the intersection of those two things is "an answer". Calculation doesn't intersect.

My google search for 'pixel 10 wifi 7' immediately shows the right answer. (10 Pro and 10 Pro XL support it but, but base Pixel 10 only supports Wifi 6E).

Though the inconsistency of results between users is definitely another frustrating thing.

Ok, fair. Hard to understand why it would get that wrong.

  • Because LLMs aren't sentient, they don't draw on facts, and they don't have nuance. The answer given is similar to answers you might expect to see for similar questions.

    It's really amazing we can make machines do that, and it's really depressing that we think a stochastic bullshit machine is going to give us something we can rely on.

    • Or… the default LLM Google uses for search has been quantized to s**. Ask a proper Thinking model, with browsing enabled, and odds of a correct answer are much higher. There’s been substantial improvement in AI in even the last year.

      Ask a human a question like this, and they also have a chance of getting it wrong, even when confident.

      5 replies →

  • They are this wrong about everything, but you don't usually notice it when using it to look for things you aren't an expert in. The default stance really does need to be "do not trust, verify" at all times.

    They can still be useful, e.g. they're significantly better at finding "I want a thing that does x but not y and it must be blue, or maybe two things that can be glued together to do that" than classic search. But they'll routinely miss extremely obvious answers because the related search it ran didn't find it, or completely screw up what something can actually do. Checking more pages of results by hand or asking humans who know even a little about those fields is still wildly more useful... but they're absolutely slaughtering the sites where people do that, by stealing all the real traffic and sending DDoS-level automated requests.

    • How can you say they are wrong about "everything"?

      I built a retro game clone once and I used that project as a way to try out AI. While it wasn't perfect, it definitely wasn't wrong about everything. I'd go so far as to say it was probably correct (or damn close) 75% of the time.

      I see people on HN all the time saying AI is terrible, but that just isn't the experience I'm having. I'm willing to admit it may have something to do with me not being able to recognize I'm being fed bullshit. Or, I may be asking really simple questions. Who knows? But AI seems like a pretty useful tool for average people.

  • I’d make assumptions about how the cheapest and fastest possible flash model optimized for being extra cheap and extra fast would get something wrong based on its limited context (which can be very incomplete summaries of search results)

    • I often have the expensive models give relatively simple inaccurate answers, even when they cite sources that directly contradict them. The error rate is lower, but you can’t have confidence with llm answers.

  • It somehow seems to interpret whatever sources it's grepping as the exact opposite of what those sources say fairly often. I've lost track of how many times I've clicked on the sources it cites, and every single one is in agreement, but the AI claims the opposite.

  • Did you just agree to a stranger's counterpoint on the internet? This post should be in a museum somewhere