Has anyone else found Google's AI overview to be oddly error prone?
11 hours ago
I've been quite impressed by Google's AI overviews. This past week, though, I was interested in what I thought was a fairly simple question - to calculate compound interest.
Specifically, I was curious about how Harvard's endowment has grown from its initial £780 in 1638, so I asked Google to calculate compound interest for me. A variety of searches all yield a reasonable formula which is then calculated to be quite wrong. For example: {calculate the present value of $100 compounded annually for 386 years at 3% interest} yields $0.736. {how much would a 100 dollar investment in 1638 be worth in 2025 if invested} yields $3,903.46. {100 dollars compounded annually for 386 years at 3 percent} yields "The future value of the investment after 386 years is approximately $70,389." And my favorite: {100 dollars compounded since 1638} tells me a variety of outcomes for different interest rates: "A = 100 * (1 + 0.06)^387 A ≈ 8,090,950.14 A = 100 * (1 + 0.05)^387 A ≈ 10,822,768.28 A = 100 * (1 + 0.04)^387 A ≈ 14,422,758.11"
How can we be so reasonable and yet so bad!?
It seems expectedly error prone.
Aside from the general limitations of this technology, Google needs this to be quite cheap if it runs for every request.
There is not a lot of revenue for a single search, and right now the AI results are actually pushing the links people are paying Google to display further down the page.
It's terrible. Gemini 2.5 Pro is great, but the AI overviews must be using a smaller model. I hate it when I look up something niche and it smugly tells me that I must be mistaken because there is no such thing. Also it gives annoyingly family-friendly responses to questions that it would be better off not responding to. The other day I was trying to find a Sopranos quote about two kinds of businesses being recession-proof, one of which being "certain aspects of entertainment" (i.e. prostitution) and it was telling me the certain aspects were filmmaking and music because they make people happy.
Why wouldn't they use 2.5 flash first, and then if an identical query is made by lots of people rerun it with 2.5 pro? Sometimes it seems much more error prone than 2.5 pro or even 2.0 even on common searches.
I think it is because it is using a "mini" model with the search results as a RAG source so they can afford to use it on every single query. Thus it doesn't know very much and doesn't have much context to work with.
They are awful often for me. Examples - recommending installation of packages and software that doesn't exist, or settings changes that don't exist I In applications, etc. They fill the page but it's sadly noise so it cheapens the whole experience when I would have just preferred a link to a page from a person that knows what the hell they are talking about.
Why would you use an LLM for this? A simple spreadsheet can do this sort of calculation easily and deterministically.
Also, the assumption of '3% interest' is wrong. There are records of stretches achieving 15% returns for several years and reaching 23% in 2007, for example.
https://www.bloomberg.com/news/articles/2005-01-11/harvard-l...
https://www.wsj.com/articles/SB118771455093604172
This was 2 minutes of old school search, no LLM needed.
Long term interest rates over hundreds of years are a lot closer to 3% than 15%. You can't extrapolate a few good years like that.
You can't compound 760 1600s pounds for 400 years and get to a dollar amount either. The whole exercise is spurious. That is beside the point.
What I am saying is that asking an LLM to do interest calculation is absurd in itself, let well alone the absurd setting of trying to calculate interest rates across 4 centuries and different denominations.
It would be much more rational, in seeking to understand the growth of the Harvard endowment, to search for factual information about its modern history is my point. And if you wanna do abstract financial modelling exercises just use spreadsheets. Either way LLMs are a hilariously bad fit.
780 compounded by 3% per year for ~400 years is about 100 million by the way. So ignoring all else, off by at least two orders of magnitude.
We're all sadly gullible.
We're all in IT. We know what an LLM is. But still we're fooled!?
I saw a report via Simon Willison that if you make up a phrase and add "meaning" to the end of your Google search, it'll invent a meaning for it.
His example was "A swan won't prevent a hurricane meaning"
https://simonwillison.net/2025/Apr/23/meaning-slop/
I think I’d call these examples “predictable” failures instead of “odd”.
I recently used Gemini and Google search (with overview) to confirm whether a snack i bought from japan has expired. Used gemini to take a picture of the label written in japanese
One item said 25/7/25 the other one said 25/7/24 as you can imagine I was sure the first one was safe but the second one was confusing.
It told me that it's safe to eat because japanese date format is Year / Month / Date.
I looked up japanese date format in google (with overview) just to confirm. I guess we'll find out. Will report back soon.
LLMs can't do math.
This. People need to manage their expectations.
LLMs and tempered expectations, like oil and water
We're giving them calculators though, surely Google could provide a limited set of tools given Search already has a fairly sophisticated calculator.
I've been having my AI stuff successfully do math since early gp3 days with this method— even before "tool calling."
LLMs can’t do math. But that’s a solved problem. ChatGPT has had a built in Python runtime that can do math for years - at least the paid version.