Google removes AI health summaries

3 days ago (arstechnica.com)

If an app makes a diagnosis or a recommendation based on health data, that's Software as a Medical Device (SaMD) and it opens up a world of liability.

https://www.fda.gov/medical-devices/digital-health-center-ex...

  • How do you suggest to deal with Gemini? Extremely useful to understand whether something is worrying or not. Whether we like it or not, it’s a main participant to the discussion.

    • Ideally, hold Google liable until their AI doesn’t confabulate medical advice.

      Realistically, sign a EULA waiving your rights because their AI confabulates medical advice

    • Apparently we should hire the Guardian to evaluate LLM output accuracy?

      Why are these products being put out there for these kinds of things with no attempt to quantify accuracy?

      In many areas AI has become this toy that we use because it looks real enough.

      It sometimes works for some things in math and science because we test its output, but overall you don't go to Gemini and it says "there's a 80% chance this is correct". At least then you could evaluate that claim.

      There's a kind of task LLMs aren't well suited to because there's no intrinsic empirical verifiability, for lack of a better way of putting it.

      2 replies →

    • > How do you suggest to deal with Gemini?

      Don't. I do not ask my mechanic for medical advice, why would I ask a random output machine?

      22 replies →

    • > How do you suggest to deal with Gemini?

      With robust fines based on % revenue whenever it breaks the law, would be my preference. I'm nit here to attempt solutions to Google's self-inflicted business-model challenges.

    • If it's giving out medical advice without a license, it should be banned from giving medical advice and the parent company fined or forced to retire it.

    • As a certified electrical engineer, the amount of times googles LLM suggested a thing that would have at minimum started a fire is staggering.

      I have the capacity to know when it is wrong, but I teach this at university level. What worries me, are the people who are on the starting end of the Dunning-Kruger curve and needed that wrong advice to start "fixing" the spaces where this might become a danger to human life.

      No information is superior to wrong information presented in a convincing way.

Not surprised. Another example is minecraft related queries. Im searching with the intention of eventually going to a certain wiki page at minecraft.wiki, but started to just read the summaries instead. It will combine fan forums discussing desired features/ideas with the actual game bible at minecraft.wiki - so it mixes one source of truth with one source of fantasy. Results in ridiculous inaccurate summaries.

  • A few months ago in a comment here on HN I speculated about the reason an old law might have been written the way it was, instead of more generally. If it had been written without the seemingly arbitrary restrictions it included there would have been no need for the new law that the thread was about.

    A couple hours later I decided to ask an LLM if it could tell me. It quickly answered, giving the same reason that I had guessed in my HN comment.

    I then clicked the two links it cited as sources. One was completely irrelevant. The other was a link to my HN comment.

  • What's interesting to me is that this kind of behavior -- slightly-buffleheaded synthesis of very large areas of discourse with widely varying levels of reliability/trustworthiness -- is actually sort of one of the best things about AI research, at least for me?

    I'm pretty good at reading the original sources. But what I don't have in a lot of cases is a gut that tells me what's available. I'll search for some vague idea (like, "someone must have done this before") with the wrong jargon and unclear explanation. And the AI will... sort of figure it out and point me at a bunch of people talking about exactly the idea I just had.

    Now, sometimes they're loons and the idea is wrong, but the search will tell me who the players are, what jargon they're using to talk about it, what the relevant controversies around the ideas are, etc... And I can take it from there. But without the AI it's actually a long road between "I bet this exists" and "Here's someone who did it right already".

    • Yeah, this is by far the biggest value I've gotten from LLMs - just pointing me to the area of literature neither me nor any of my friends have heard of, but which have spent a decade running about the problems we're running into.

      In this case, all that matters is that the outputs aren't complete hallucination. Once you know the magic jargon, everything opens up easily with traditional search.

      2 replies →

  • I had a similar thing happen to me just today. A friend of mine had finished a book in a series. I have read the series but it was almost 10 years ago, and I needed a refresher with spoilers, so I went looking.

    Well, some redditor had posted a comparison of a much later book in the series, and drawn all sorts of parallels and foreshadowing and references between this quite early book I was looking for and the much later one. It was an interesting post so it had been very popular.

    The AI summary completely confused the two books because of this single reddit post, so the summary I got was hopelessly poisoned with plot points and characters that wouldn't show up until nearly the conclusion. It simply couldn't tell which book was which. It wasn't quite as ridiculous as having, say, Anakin Skywalker face Kylo Ren in a lightsaber duel, but it was definitely along those same lines of confusion.

    Fortunately, I finished the later book recently enough to remember it, but it was like reading a fever dream.

  • Yeah, it happened recently for a kubernetes resource. I was searching for how to do something, and Google AI helpfully showed me a kubernetes resource that was exactly what I needed, and was designed to work exactly how I needed it.

    Sadly, the resource didn't actually exist. It would have been perfect if it did, though!

  • I find its tricky with games, especially ones as updated as frequently as Minecraft over the years. I've had some of this trouble with OSRS. It brings in old info, or info from a League/Event that isn't relevant. Easier to just go to the insanely curated wiki.

  • It's a common problem.

    At some point in time when asked how many Kurdish people live in Poland, Google's AI would say that several million, which was true, but only in a fantasy world conjured by a certain LARP group, who put a wiki on fandom.com.

  • Or you can take the alternative approach, where Microsoft's own "Merl" support agent says it knows anything to do with Minecraft, and then replies to basically any gameplay question with "I don't know that".

  • I run a small business that buys from one of two suppliers of the items we need. The supplier has a TRASH website search feature. It's quicker to Google it.

    Now that AI summaries exist, I have to scroll past half a page of result and nonsense about a Turkish oil company before I find the item I'm looking for.

    I hate it. It's such a minor inconvenience, but it's just so annoying. Like a sore tooth.

  • I've had the exact same issue with questions about EU5, but it mixed up EU4 and EU5 mechanics, making the answer just terrible.

"Dangerous and Alarming" - it tough; healthcare is needs disruption but unlike many places to target for disruption, the risk is life and death. It strikes me that healthcare is a space to focus on human in the loop applications and massively increasing the productivity of humans, before replacing them... https://deadstack.net/cluster/google-removes-ai-overviews-fo...

  • Why does healthcare "need disruption"?

    • Many people don’t get it, it’s really expensive, even in countries with non broken healthcare systems (not the us) costs increase rapidly and no one is sure how the systems will remain solvent with the same level of care given today. The way things are currently done are entrenched but not sustainable, that’s when disruptions are apt to appear.

      7 replies →

    • because in america at least, the supply of doctors is kept artificially low. that combined with exploding administrative headcount, means patients are getting pretty terrible, expensive service.

      4 replies →

    • Because insurance companies incentivize upward price momentum. The ones who innovate and bring the prices down are not rewarded for their efforts. Health inflation is higher than headline inflation because of this absence of price pressure

      13 replies →

    • Seriously? Spending a night in a hospital results in a $10,000 bill (though the real out of pocket is significantly cheaper. God help you if you have no insurance though). Healthcare in the US is the thing that needs the biggest disruption.

      12 replies →

Good. I typed in a search for some medication I was taking and Google's "AI" summary was bordering on criminal. The WebMD site had the correct info, as did the manufacturer's website. Google hallucinated a bunch of stuff about it, and I knew then that they needed to put a stop to LLMs slopping about anything to do with health or medical info.

  • What medication is this? It's always given me good info.

    • I'm not sure why you would think I would blurt that out on the internet.

      And are you sure it's giving you good info? "AI" is famously subject to hallucinations, so you may not be getting the "good info" you think you're getting. Be careful with "AI", it's not an all-seeing-all-knowing infallible oracle.

      1 reply →

  • s/hallucinated/fabricated/, please.

    • arguably: incorrectly guessed*

      in a way, all overconfident guessing is a better match for the result than hallucination or fabrication would be

      "confabulation", though, seems perfect:

      “Confabulation is distinguished from lying as there is no intent to deceive and the person is unaware the information is false. Although individuals can present blatantly false information, confabulation can also seem to be coherent, internally consistent, and relatively normal.”

      https://en.wikipedia.org/wiki/Confabulation

      * insofar as “guess” conveys an attempt to be probably in the zone

Google is really wrecking its brand with the search AI summaries thing, which is unbelievably bad compared to their Gemini offerings, including the free one. The continued existence of it is baffling.

  • It's mystifying. A relative showed me a heavily AI-generated video claiming a Tesla wheelchair was coming (self-driving of course, with a sub-$800 price tag). I tried to Google it to quickly debunk and got an AI Overview confidently stating it was a real thing. The source it linked to: that same YouTube video!

  • Yeah. It's the final nail in the coffin of search, which now actively surfaces incorrect results when it isn't serving ads that usually deliberately pretend to be the site you're looking for. The only thing I use it for any more is to find a site I know exists but I don't know the URL of.

    • What do you use instead… that doesn’t piggyback off of google search?

  • The AI summaries clearly aren’t bad. I’m not sure what kind of weird shit you search for that you consider the summaries bad. I find them helpful and click through to the cited sources.

    • ...and the cited source is AI generated video(s). There are summaries that say exactly the opposite of the correct result.

> Google … constantly measures and reviews the quality of its summaries across many different categories of information, it added.

Notice how little this sentence says about whether anything is any good.

  • I can't speak to Health summaries, but the Weather summary on Pixel phones is not good.

    Like you could have a few days of -3C, for today it goes up to +5C, and the "AI Weather report" tells you it's going to be a chilly day or something.

    I never saw this feature provide any useful information whatsoever.

  • No sane, in the loop person would believe Google reviews anything, at least not with a human. I bet they have another AI reviewing the answers, and the only time a human enters the loop is when this second AI needs to be restarted.

Tangent, but some people I know have been downloading their genomes from 23andme and asking Gemini via Antigravity to analyze it. "If you don't die of heart disease by 50, you'll probably live to be 100."

I wonder how accurate it is.

  • 23&Me data is simply not accurate enough to make reasonable predictions about health outcomes.

    I have a whole genome and nothing Google has built has been able to do anything useful with it, medically speaking. I could use DeepVariant to re-map all the raw reads, it would only slightly increase the accuracy of the estimate of my genome sequence. When I met with genetic counselors, they analyzed my genome and told me I had no known markers for any disease (and they also told me they Google all the unique variants that show up in the report).

    (for what it's worth, I literally went to work at Google to improve their medical/health/genomics research, and after working on it a few years I concluded that the entire genomics field is about 90% fantasy. If you want actionable data, there are a small number of well-regulated tests that can help in a limited set of circumstances, but those aren't whole genome tests).

  • As accurate as our knowledge of genetics, which is not very outside of the identified set of pathological genes associated with hereditary disorders.

    Your genome is very complex and we don’t have a model of how every gene interacts with every other and how they’re affected by your environment. Geneticists are working on it, but it’s not here yet.

    And remember that 23andMe, Ancestry, and most other services only sequence around 1% of your genome.

    • I'd guess it's much less accurate than that.

      Part of genetics is pattern matching, and last time I checked I still can't find a model that can correctly solve hard Sudokus (well, assuming you don't pick a coding model that writes a Sudoku solver.. maybe some of them are trying to do genetics by doing correct algorithms), a trivial job if you write a program that is designed to do it.

  • Are you asking for you in particular? It's certainly not accurate in general that anyone that made it to 50 is likely to live to 100.

    One I heard was if you make it to 80 you have a 50% chance to make it to 90. If you make it to 90 you have a 50% chance to make it to 95. From 95 to 97.5 again 50% chance. That for the general population, in a 1st world country though, not any individual.

    • The assessment he got was "you have supergenerian genes, but also this one that's really sensitive to heart disease. If you can keep you plaque in check, you're almost invincible (to other popular causes of mortality)."

      It cited whatever gene it said he had that made that so.

  • Our understanding of genomic data is in its infancy. These people are idiots who don't know what they don't know.

But only for some highly specific searches, when what it should be doing is checking if it's any sort of medical query and keeping the hell out of it because it can't guarantee reliability.

It's still baffling to me that the world's biggest search company has gone all-in on putting a known-unreliable summary at the top of its results.

This incessant, unchecked[1] peddling is what rids "AI" of the good name it could earn for the things it's good at.

But Alas, infinite growth or nothing is the name of the game now.

[1] Well, not entirely thanks to people investigating.

Google for "malay people acne" or other acne-related queries. It will readily spit out the dumbest pseudo science you can find. The AI bot finds a lot of dumb shit on the internet which it serves back to you on the Google page. You can also ask it about the Kangen MLM water scam. Why do athletes drink Kangen water? "Improved Recovery Time" Sure buddy.

Also try "health benefits of circumcision"...

  • I agree with your point.

    Going offtopic: The "health benefits of circumcision" bogus has existed for decades. The search engines are returning the results of bogus information, because the topic is mostly relevant for its social and religious implications.

    I am related with the topic, and discussion is similar to topics on politics: Most people don't care and will stay quiet while a very aggresive group will sell it as a panacea.

    • the problem isn't that search engines are polluted; that's well known. The problem is that people perceive these AI responses as something greater than a search query; they view it as an objective view point that was reasoned out by some sound logical method -- and anyone that understands the operation of LLMs knows that they don't really do that, except for some very specific edge examples.

... at the same time, OpenAI launches their ChatGPT Health service: https://openai.com/index/introducing-chatgpt-health/, marketed as "a dedicated experience in ChatGPT designed for health and wellness."

So interesting to see the vastly different approaches to AI safety from all the frontier labs.

  • Why vastly different?

    Aren't they both searching various online sources for relevant information and feeding that into the LLM?

    • Different levels of capabilities. The summary feature in google uses a quick and inaccurate AI model. Were it to be a heavier model, we wouldn’t have this problem.

      2 replies →

It took being a meme for a literal year for them to remove this… more responsibility in frontier tech, I’m begging.

The fact that it reached this point is further evidence that if the AI apocalypse is a possibility, common sense will not save us.

Ars rips of this original reporting, but makes it worse by leaving out the word "some" from the title.

‘Dangerous and alarming’: Google removes some of its AI summaries after users’ health put at risk: https://www.theguardian.com/technology/2026/jan/11/google-ai...

  • Removing "some" doesn't make it worse. They didn't include "all" AI titles which it would. "Google removes AI health summaries after investigation finds dangerous flaws " is functionally equivalent to "Google removes some of its AI summaries after users’ health put at risk"

    Oh, and also, the Ars article itself still contains the word "Some" (on my AB test). It's the headline on HN that left it out. So your complaint is entirely invalid: "Google removes some AI health summaries after investigation finds “dangerous” flaws"

I'm telling you all this as a medical student that has used the latest and greatest models with proper prompting for the past 3 years in school:

There are a ton of misses. Especially on imaging. LLMs are not ready for consumer-facing health information yet. My guess is ~ 3-5 years. Right now, I see systems implementing note writing with LLMs, which is hit or miss (but will rapidly improve). Physicians want 1:1 customization. Have someone sit with them and talk through how they like their notes/set it up so the LLMs produce notes like that. Notes need to be customized at the physician level.

Also, the electronic health records any AI is trained on is loaded to the brim with borderline fraud/copy paste notes. That's going to have to be reconciled. Do we have the LLMs add "Cranial Nerves II-X intact" even though the physician did not actually assess that? The physician would have documented that... No? But then you open up the physician to liability, which is a no go for adopting software.

Building a SaaS MVP that's 80% of the way there? Sure. But medicine is not an MVP you cram into a pitch deck for a VC. 80% of the way there does not cut it here, especially if we're talking about consumer facing applications. Remember, the average American reads at a 6th grade reading level. Pause and let that sink in. You're probably surrounded by college educated people like yourself. It was a big shock when I started seeing patients, even though I am the first in my family to go to college. Any consumer-facing health AI tool needs to be bulletproof!!

Big Tech will not deliver us a healthcare utopia. Do not buy into their propaganda. They are leveraging post-pandemic increases in mistrust towards physicians as a springboard for half-baked solutions. Want to make $$$ doing the same thing? Do it in a different industry.

They AI summary is total garbage. Probably most broken feature I saw being released in a while.

huh.. so google doesn't trust it's own product.. but openai and anthropic are happy to lie? lol

Claude just added Health Connect integration for Android.

Meanwhile Copilot launched a full bot for it:

"Dos and don’ts of medical AI While AI is a useful tool that can help you understand medical information, it’s important to clarify what it’s designed to do (and what it isn’t).

Dos: Use AI as a reliable guide for finding doctors and understanding care options. Let AI act as an always available medical assistant that explains information clearly. Use AI as a transparent, unbiased source of clinically validated health content. Don’ts: Don’t use AI for medical diagnosis. If you’re concerned you may have a medical issue, you should seek the help of a medical professional. Don’t replace your doctor or primary care provider with an “AI doctor”. AI isn’t a doctor. You should always consult a professional before making any medical decisions. This clarity is what makes Copilot safe" https://www.microsoft.com/en-us/microsoft-copilot/for-indivi...

chatGPT told me, I am the healthiest guy in the world, and I believe it

  • If it's any help women appreciate confidence; Yet, the article is not about chatgpt