Comment by chrisweekly
8 days ago
Putting trust in an LLM is insanely dangerous. See this ChatGPT exchange for a stark example: https://amandaguinzburg.substack.com/p/diabolus-ex-machina
8 days ago
Putting trust in an LLM is insanely dangerous. See this ChatGPT exchange for a stark example: https://amandaguinzburg.substack.com/p/diabolus-ex-machina
That kind of exchange is something I have seen from ChatGPT and I think it represents a specific kind of failure case.
It is almost like Schizophrenic behaviour as if a premise is mistakenly hardwired in the brain as being true, all other reasoning adapts a view of the world to support that false premise.
In the instance if ChatGPT the problem seems to be not with the LLM architecture itself but and artifact of the rapid growth and change that has occurred in the interface. They trained the model to be able to read web pages and use the responses, but then placed it in an environment where, for whatever reason, it didn't actually fetch those pages. I can see that happening because of faults, or simply changes in infrastructure, protocols, or policy which placed the LLM in an environment different from the one it expected. If it was trained handling web requests that succeeded, it might not have been able to deal with failures of requests. Similar to the situation with the schizophrenic, it has a false premise. It presumes success and responds as if there were a success.
I haven't seen this behaviour so much in other platforms, A little bit in Claude with regard to unreleased features that it can perceive via interface but has not been trained to support or told about. It doesn't assume success on failure but it does sometimes invent what the features are based upon the names of reflected properties.
This is 40 screenshots of a writer at the New Yorker finding out that LLMs hallucinate, almost 3 years after GPT 2.0 was released. I’ve always held journalists in a low regard but how can one work in this field and only just now be finding out about the limitations to this technology?
3 years ago people understood LLMs hallucinated and shouldn't be trusted with important tasks.
Somehow in the 3 years since then the mindset has shifted to "well it works well enough for X, Y, and Z, maybe I'll talk to gpt about my mental health." Which, to me, makes that article much more timely than if it had been released 3 years ago.
I disagree with your premise that 3 years ago “people” knew about hallucinations or that these models shouldn’t be trusted.
I would argue that today most people do not understand that and actually trust LLM output more on face value.
Unless maybe you mean people = software engineers who at least dabble in some AI research/learnings on the side
She's a writer submitting original short pieces to the New Yorker in hopes of being published, by no stretch a "journalist" let alone "at the New Yorker". I've always held judgmental HN commenters in low regard but how can one take the time to count the screenshots without picking up on the basic narrative context?
> She's a writer submitting original short pieces to the New Yorker in hopes of being published, by no stretch a "journalist" let alone "at the New Yorker".
Her substack bio reads: Writer/Photographer/Editor/New Yorker. Is the ordinary interpretation of that not: “I am a writer / photographer / editor at the New Yorker”?
Sycophancy is not the only problem (although is a big one). I would simply never put my therapy conversations up on a third-party server that a) definitely uses them for further training and b) may decide to sell them to, say, healthcare insurance companies when they need some quick cash.
This is the second time this has been linked in the thread. Can you say more about why this interaction was “insanely dangerous”? I skim read it and don’t understand the harm at a glance. It doesn’t look like anything to me.
I have had a similar interaction when I was building an AI agent with tool use. It kept on telling me it was calling the tools, and I went through my code to debug why the output wasn't showing up, and it turns out it was lying and 'hallucinating' the response. But it doesn't feel like 'hallucinating', it feels more like fooling me with responses.
It is a really confronting thing to be tricked by a bot. I am an ML engineer with a master's in machine learning, experience at a research group in gen-ai (pre-chatgpt), and I understand how these systems work from the underlying mathematics all the way through to the text being displayed on the screen. But I spent 30 minutes debugging my system because the bot had built up my trust and then lied to me that it was doing what it said it was doing, and been convincing enough in its hallucination for me to believe it.
I cannot imagine how dangerous this skill could be when deployed against someone who doesn't know how the sausage is made. Think validating conspiracy theories and convincing humans into action.
Its funny isn't it - it doesn't lie like a human does. It doesn't experience any loss of confidence when it is caught saying totally made up stuff. I'd be fascinated to know how much of what chatgpt has told me is straight out wrong.
> I cannot imagine how dangerous this skill could be when deployed against someone who doesn't know how the sausage is made. Think validating conspiracy theories and convincing humans into action.
Its unfortunately no longer hypothetical. There's some crazy stories showing up of people turning chatgpt into their personal cult leader.
https://www.nytimes.com/2025/06/13/technology/chatgpt-ai-cha... ( https://archive.is/UUrO4 )
2 replies →
Absolutely wild that software this defective is being funded to the tune of $billions, and being touted as a seismic advance in technology.
Have human therapists ever wildly failed to merit trust?
Of course they have, but there are other humans and untrustworthy humans can be removed from a position of trust by society
How do we take action against untrustworthy LLMs?
The same way you do against humans: report them, to some combination of their management, regulatory bodies, and the media.
3 replies →
Not in a way that indicates humans can never be trusted, no.
[dead]