Because LLMs aren't sentient, they don't draw on facts, and they don't have nuance. The answer given is similar to answers you might expect to see for similar questions.
It's really amazing we can make machines do that, and it's really depressing that we think a stochastic bullshit machine is going to give us something we can rely on.
Or… the default LLM Google uses for search has been quantized to s**. Ask a proper Thinking model, with browsing enabled, and odds of a correct answer are much higher. There’s been substantial improvement in AI in even the last year.
Ask a human a question like this, and they also have a chance of getting it wrong, even when confident.
I think that it feels a little wasteful to go to Google search to ask a question like this, only for the AI that's giving you an answer instead of page results to perform its own web search to get you the response.
Also, I asked a thinking model with browsing enabled and got this:
> The Google Pixel 10 is expected to support Wi-Fi 7 (802.11be), based on the Qualcomm Snapdragon 8 Gen 4 / Tensor G5 chipset it will likely use, which includes an integrated Wi-Fi 7 modem. Specific finalized specs aren't confirmed until Google's official announcement.
(Model GLM-5-Turbo - two months old - using Kilo Code in the "Ask" profile; in its thinking token churn it reasoned that it should keep the response brief and direct. Perhaps not the best suite of model+harness for this task, but it's what I had to hand that's not quantized to shit, is a thinking model, and has a web search tool available to it.)
Why would a human know specs for a random phone off the top of their head? The human response is either "I don't know" or "let me look that up", not a hallucination.
> Ask a human a question like this, and they also have a chance of getting it wrong, even when confident.
We google something specifically because the humans within reach don't know. The goal of searching is, well, to search pages - we're trying to find a site when we use google search.
The goal when using an LLM is generally different; we want an answer, not a site.
They are this wrong about everything, but you don't usually notice it when using it to look for things you aren't an expert in. The default stance really does need to be "do not trust, verify" at all times.
They can still be useful, e.g. they're significantly better at finding "I want a thing that does x but not y and it must be blue, or maybe two things that can be glued together to do that" than classic search. But they'll routinely miss extremely obvious answers because the related search it ran didn't find it, or completely screw up what something can actually do. Checking more pages of results by hand or asking humans who know even a little about those fields is still wildly more useful... but they're absolutely slaughtering the sites where people do that, by stealing all the real traffic and sending DDoS-level automated requests.
How can you say they are wrong about "everything"?
I built a retro game clone once and I used that project as a way to try out AI. While it wasn't perfect, it definitely wasn't wrong about everything. I'd go so far as to say it was probably correct (or damn close) 75% of the time.
I see people on HN all the time saying AI is terrible, but that just isn't the experience I'm having. I'm willing to admit it may have something to do with me not being able to recognize I'm being fed bullshit. Or, I may be asking really simple questions. Who knows? But AI seems like a pretty useful tool for average people.
I’d make assumptions about how the cheapest and fastest possible flash model optimized for being extra cheap and extra fast would get something wrong based on its limited context (which can be very incomplete summaries of search results)
I often have the expensive models give relatively simple inaccurate answers, even when they cite sources that directly contradict them. The error rate is lower, but you can’t have confidence with llm answers.
It somehow seems to interpret whatever sources it's grepping as the exact opposite of what those sources say fairly often. I've lost track of how many times I've clicked on the sources it cites, and every single one is in agreement, but the AI claims the opposite.
Because LLMs aren't sentient, they don't draw on facts, and they don't have nuance. The answer given is similar to answers you might expect to see for similar questions.
It's really amazing we can make machines do that, and it's really depressing that we think a stochastic bullshit machine is going to give us something we can rely on.
Or… the default LLM Google uses for search has been quantized to s**. Ask a proper Thinking model, with browsing enabled, and odds of a correct answer are much higher. There’s been substantial improvement in AI in even the last year.
Ask a human a question like this, and they also have a chance of getting it wrong, even when confident.
I think that it feels a little wasteful to go to Google search to ask a question like this, only for the AI that's giving you an answer instead of page results to perform its own web search to get you the response.
Also, I asked a thinking model with browsing enabled and got this:
> The Google Pixel 10 is expected to support Wi-Fi 7 (802.11be), based on the Qualcomm Snapdragon 8 Gen 4 / Tensor G5 chipset it will likely use, which includes an integrated Wi-Fi 7 modem. Specific finalized specs aren't confirmed until Google's official announcement.
(Model GLM-5-Turbo - two months old - using Kilo Code in the "Ask" profile; in its thinking token churn it reasoned that it should keep the response brief and direct. Perhaps not the best suite of model+harness for this task, but it's what I had to hand that's not quantized to shit, is a thinking model, and has a web search tool available to it.)
> Ask a human a question like this
Why would a human know specs for a random phone off the top of their head? The human response is either "I don't know" or "let me look that up", not a hallucination.
> Ask a human a question like this, and they also have a chance of getting it wrong, even when confident.
We google something specifically because the humans within reach don't know. The goal of searching is, well, to search pages - we're trying to find a site when we use google search.
The goal when using an LLM is generally different; we want an answer, not a site.
1 reply →
*so long as an accurate answer exists on the internet
Claude is OK at saying when it can’t find good information, but it’s still 50/50 on citing a source that has nothing to do with its claim.
They are this wrong about everything, but you don't usually notice it when using it to look for things you aren't an expert in. The default stance really does need to be "do not trust, verify" at all times.
They can still be useful, e.g. they're significantly better at finding "I want a thing that does x but not y and it must be blue, or maybe two things that can be glued together to do that" than classic search. But they'll routinely miss extremely obvious answers because the related search it ran didn't find it, or completely screw up what something can actually do. Checking more pages of results by hand or asking humans who know even a little about those fields is still wildly more useful... but they're absolutely slaughtering the sites where people do that, by stealing all the real traffic and sending DDoS-level automated requests.
How can you say they are wrong about "everything"?
I built a retro game clone once and I used that project as a way to try out AI. While it wasn't perfect, it definitely wasn't wrong about everything. I'd go so far as to say it was probably correct (or damn close) 75% of the time.
I see people on HN all the time saying AI is terrible, but that just isn't the experience I'm having. I'm willing to admit it may have something to do with me not being able to recognize I'm being fed bullshit. Or, I may be asking really simple questions. Who knows? But AI seems like a pretty useful tool for average people.
I’d make assumptions about how the cheapest and fastest possible flash model optimized for being extra cheap and extra fast would get something wrong based on its limited context (which can be very incomplete summaries of search results)
I often have the expensive models give relatively simple inaccurate answers, even when they cite sources that directly contradict them. The error rate is lower, but you can’t have confidence with llm answers.
It somehow seems to interpret whatever sources it's grepping as the exact opposite of what those sources say fairly often. I've lost track of how many times I've clicked on the sources it cites, and every single one is in agreement, but the AI claims the opposite.
Did you just agree to a stranger's counterpoint on the internet? This post should be in a museum somewhere
The simple answer is that these systems are very bad at telling the truth reliably.